Qwen2.5-32B features 32 billion parameters, 64 layers, and a 40/8 attention head architecture, representing a significant leap in computational power and model capabilities. With support for a 128K-token context window and 8K-token generation capacity, the model can handle exceptionally complex and large-scale tasks.
Qwen2.5-32B reintroduces the 32B parameter size to the Qwen series after its absence in Qwen2, offering users a powerful alternative to the flagship 72B model with lower resource requirements. Trained on 18 trillion high-quality tokens, the model demonstrates robust performance with large datasets, expert-level knowledge in specialized domains, superior abstract reasoning capabilities, and the ability to solve problems requiring deep contextual understanding and multi-step analysis.
Qwen2.5-32B is designed for organizations and research teams that need frontier-model capabilities without the full cost of the largest models. Ideal applications include scientific research, complex software development, high-quality content creation, expert support systems in medicine and law, and as a foundation for building highly specialized AI systems.
| Model Name | Context | Type | GPU | Status | Link |
|---|
There are no public endpoints for this model yet.
Rent your own physically dedicated instance with hourly or long-term monthly billing.
We recommend deploying private instances in the following scenarios:
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
32,768.0 pipeline |
3 | $0.88 | 1.479 | Launch | ||
32,768.0 tensor |
2 | $0.93 | 2.113 | Launch | ||
32,768.0 tensor |
4 | $0.96 | 2.811 | Launch | ||
32,768.0 pipeline |
3 | $1.06 | 1.493 | Launch | ||
32,768.0 tensor |
2 | $1.23 | 2.113 | Launch | ||
32,768.0 tensor |
4 | $1.26 | 2.829 | Launch | ||
32,768.0 tensor |
2 | $1.56 | 2.357 | Launch | ||
32,768.0 tensor |
2 | $1.92 | 2.348 | Launch | ||
32,768.0 |
1 | $2.37 | 6.464 | Launch | ||
32,768.0 tensor |
2 | $2.93 | 4.149 | Launch | ||
32,768.0 |
1 | $3.83 | 6.456 | Launch | ||
32,768.0 |
1 | $4.11 | 8.056 | Launch | ||
32,768.0 tensor |
2 | $4.61 | 15.166 | Launch | ||
32,768.0 |
1 | $4.74 | 13.428 | Launch | ||
32,768.0 tensor |
2 | $9.40 | 29.092 | Launch | ||
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
32,768.0 pipeline |
3 | $1.34 | 2.315 | Launch | ||
32,768.0 tensor |
4 | $1.57 | 4.617 | Launch | ||
32,768.0 pipeline |
6 | $1.65 | 3.389 | Launch | ||
32,768.0 pipeline |
3 | $2.29 | 2.681 | Launch | ||
32,768.0 tensor |
4 | $2.34 | 4.617 | Launch | ||
32,768.0 |
1 | $2.37 | 38.260 | 4.619 | Launch | |
32,768.0 pipeline |
3 | $2.83 | 2.667 | Launch | ||
32,768.0 tensor |
4 | $2.89 | 5.105 | Launch | ||
32,768.0 tensor |
2 | $2.93 | 2.303 | Launch | ||
32,768.0 tensor |
4 | $3.60 | 5.087 | Launch | ||
32,768.0 |
1 | $3.83 | 37.000 | 4.611 | Launch | |
32,768.0 |
1 | $4.11 | 58.780 | 6.210 | Launch | |
32,768.0 tensor |
2 | $4.61 | 13.320 | Launch | ||
32,768.0 |
1 | $4.74 | 11.582 | Launch | ||
32,768.0 tensor |
2 | $9.40 | 27.246 | Launch | ||
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
32,768.0 tensor |
4 | $1.76 | 1.072 | Launch | ||
32,768.0 |
1 | $2.51 | 1.073 | Launch | ||
32,768.0 tensor |
4 | $2.97 | 1.559 | Launch | ||
32,768.0 tensor |
4 | $3.68 | 1.541 | Launch | ||
32,768.0 |
1 | $3.96 | 1.065 | Launch | ||
32,768.0 |
1 | $4.12 | 2.664 | Launch | ||
32,768.0 pipeline |
3 | $4.35 | 1.711 | Launch | ||
32,768.0 |
1 | $4.74 | 8.036 | Launch | ||
32,768.0 tensor |
2 | $4.94 | 9.774 | Launch | ||
32,768.0 tensor |
4 | $5.76 | 5.143 | Launch | ||
32,768.0 tensor |
2 | $9.41 | 23.701 | Launch | ||
Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.