Qwen2-72B is the flagship model of the series, featuring 72 billion parameters. Its architecture includes 80 layers with a hidden size of 8192 and implements the Grouped Query Attention mechanism with 64 query heads and 8 shared key-value heads. Combined with Dual Chunk Attention and YARN technologies, this design ensures maximum performance in processing long contexts and efficient management of KV-cache memory.
The model was trained on a high-quality dataset of 7 trillion tokens, offering maximum data diversity. The base version of the model achieves outstanding results on key benchmarks: 84.2 on MMLU, 37.9 on GPQA, 64.6 on HumanEval, 89.5 on GSM8K, and 82.4 on BBH. The instruction-tuned version, Qwen2-72B-Instruct, scores 9.1 on MT-Bench, 48.1 on Arena-Hard, and 35.7 on LiveCodeBench, placing it among the top-tier proprietary models.
Qwen2-72B demonstrates exceptional capabilities in complex reasoning, step-by-step problem solving, advanced programming, and deep contextual understanding. Its multilingual support enables professional-level performance in more than 30 languages, including Russian. Accordingly, it is designed for the most demanding AI use cases—high-level scientific research, complex software development, creation of high-quality professional content, advanced data analysis, automation of complex business processes, and intelligent decision-making systems.
| Model Name | Context | Type | GPU | Status | Link |
|---|
There are no public endpoints for this model yet.
Rent your own physically dedicated instance with hourly or long-term monthly billing.
We recommend deploying private instances in the following scenarios:
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
32,768.0 tensor |
4 | $0.96 | 1.407 | Launch | ||
32,768.0 tensor |
4 | $1.26 | 1.407 | Launch | ||
32,768.0 pipeline |
3 | $1.34 | 2.377 | Launch | ||
32,768.0 tensor |
4 | $1.57 | 4.287 | Launch | ||
32,768.0 tensor |
2 | $2.22 | 1.907 | Launch | ||
32,768.0 pipeline |
3 | $2.29 | 2.377 | Launch | ||
32,768.0 tensor |
4 | $2.34 | 4.287 | Launch | ||
32,768.0 |
1 | $2.37 | 3.597 | Launch | ||
32,768.0 pipeline |
3 | $2.83 | 2.377 | Launch | ||
32,768.0 tensor |
4 | $2.89 | 4.287 | Launch | ||
32,768.0 tensor |
2 | $2.93 | 1.907 | Launch | ||
32,768.0 tensor |
4 | $3.60 | 4.287 | Launch | ||
32,768.0 |
1 | $3.83 | 3.597 | Launch | ||
32,768.0 |
1 | $4.11 | 4.857 | Launch | ||
32,768.0 |
1 | $4.74 | 9.087 | Launch | ||
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
32,768.0 pipeline |
6 | $3.50 | 4.754 | Launch | ||
32,768.0 pipeline |
3 | $3.89 | 1.184 | Launch | ||
32,768.0 |
1 | $4.11 | 1.504 | Launch | ||
32,768.0 pipeline |
3 | $4.34 | 1.184 | Launch | ||
32,768.0 tensor |
4 | $4.35 | 3.814 | Launch | ||
32,768.0 tensor |
2 | $4.61 | 7.194 | Launch | ||
32,768.0 tensor |
8 | $4.61 | 8.574 | Launch | ||
32,768.0 |
1 | $4.74 | 5.734 | Launch | ||
32,768.0 tensor |
4 | $5.74 | 3.814 | Launch | ||
32,768.0 pipeline |
6 | $5.83 | 4.754 | Launch | ||
32,768.0 tensor |
8 | $7.51 | 8.574 | Launch | ||
32,768.0 tensor |
2 | $7.84 | 7.194 | Launch | ||
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
32,768.0 pipeline |
3 | $7.36 | 6.299 | Launch | ||
32,768.0 tensor |
2 | $8.17 | 1.869 | Launch | ||
32,768.0 pipeline |
6 | $8.86 | 1.229 | Launch | ||
32,768.0 tensor |
4 | $9.14 | 13.249 | Launch | ||
32,768.0 tensor |
2 | $9.41 | 10.329 | Launch | ||
32,768.0 tensor |
8 | $11.55 | 6.489 | Launch | ||
32,768.0 pipeline |
3 | $11.73 | 6.299 | Launch | ||
32,768.0 tensor |
4 | $14.96 | 13.249 | Launch | ||
Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.