Qwen2-72B is the flagship model of the series, featuring 72 billion parameters. Its architecture includes 80 layers with a hidden size of 8192 and implements the Grouped Query Attention mechanism with 64 query heads and 8 shared key-value heads. Combined with Dual Chunk Attention and YARN technologies, this design ensures maximum performance in processing long contexts and efficient management of KV-cache memory.
The model was trained on a high-quality dataset of 7 trillion tokens, offering maximum data diversity. The base version of the model achieves outstanding results on key benchmarks: 84.2 on MMLU, 37.9 on GPQA, 64.6 on HumanEval, 89.5 on GSM8K, and 82.4 on BBH. The instruction-tuned version, Qwen2-72B-Instruct, scores 9.1 on MT-Bench, 48.1 on Arena-Hard, and 35.7 on LiveCodeBench, placing it among the top-tier proprietary models.
Qwen2-72B demonstrates exceptional capabilities in complex reasoning, step-by-step problem solving, advanced programming, and deep contextual understanding. Its multilingual support enables professional-level performance in more than 30 languages, including Russian. Accordingly, it is designed for the most demanding AI use cases—high-level scientific research, complex software development, creation of high-quality professional content, advanced data analysis, automation of complex business processes, and intelligent decision-making systems.
| Model Name | Context | Type | GPU | Status | Link |
|---|
There are no public endpoints for this model yet.
Rent your own physically dedicated instance with hourly or long-term monthly billing.
We recommend deploying private instances in the following scenarios:
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
32,768.0 pipeline |
3 | $1.34 | 1.825 | Launch | ||
32,768.0 tensor |
4 | $1.57 | 3.607 | Launch | ||
32,768.0 pipeline |
6 | $1.65 | 2.559 | Launch | ||
32,768.0 pipeline |
3 | $2.29 | 2.118 | Launch | ||
32,768.0 tensor |
4 | $2.34 | 3.607 | Launch | ||
32,768.0 |
1 | $2.37 | 3.608 | Launch | ||
32,768.0 pipeline |
3 | $2.83 | 2.107 | Launch | ||
32,768.0 tensor |
4 | $2.89 | 3.997 | Launch | ||
32,768.0 tensor |
2 | $2.93 | 1.756 | Launch | ||
32,768.0 tensor |
4 | $3.60 | 3.983 | Launch | ||
32,768.0 |
1 | $3.83 | 3.602 | Launch | ||
32,768.0 |
1 | $4.11 | 4.882 | Launch | ||
32,768.0 tensor |
2 | $4.61 | 10.569 | Launch | ||
32,768.0 |
1 | $4.74 | 9.179 | Launch | ||
32,768.0 tensor |
2 | $9.40 | 21.710 | Launch | ||
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
32,768.0 tensor |
8 | 7.995 | Launch | |||
32,768.0 |
1 | $4.12 | 1.529 | Launch | ||
32,768.0 |
1 | $4.74 | 5.826 | Launch | ||
32,768.0 tensor |
2 | $4.93 | 7.216 | Launch | ||
32,768.0 tensor |
2 | $4.94 | 7.216 | Launch | ||
32,768.0 tensor |
4 | $5.76 | 3.511 | Launch | ||
32,768.0 pipeline |
6 | $5.84 | 3.962 | Launch | ||
32,768.0 tensor |
8 | $7.52 | 7.965 | Launch | ||
32,768.0 tensor |
2 | $7.85 | 7.204 | Launch | ||
32,768.0 tensor |
2 | $9.41 | 18.358 | Launch | ||
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
32,768.0 pipeline |
3 | $7.36 | 7.171 | Launch | ||
32,768.0 tensor |
8 | $7.52 | 1.128 | Launch | ||
32,768.0 tensor |
2 | $8.17 | 2.926 | Launch | ||
32,768.0 pipeline |
6 | $8.86 | 1.105 | Launch | ||
32,768.0 tensor |
4 | $9.14 | 14.301 | Launch | ||
32,768.0 tensor |
2 | $9.41 | 11.521 | Launch | ||
32,768.0 tensor |
2 | $9.41 | 11.521 | Launch | ||
32,768.0 tensor |
4 | $9.50 | 14.301 | Launch | ||
32,768.0 tensor |
8 | $11.55 | 6.891 | Launch | ||
32,768.0 pipeline |
3 | $11.73 | 7.152 | Launch | ||
32,768.0 tensor |
4 | $14.96 | 14.276 | Launch | ||
Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.