Qwen2.5-32B features 32 billion parameters, 64 layers, and a 40/8 attention head architecture, representing a significant leap in computational power and model capabilities. With support for a 128K-token context window and 8K-token generation capacity, the model can handle exceptionally complex and large-scale tasks.
Qwen2.5-32B reintroduces the 32B parameter size to the Qwen series after its absence in Qwen2, offering users a powerful alternative to the flagship 72B model with lower resource requirements. Trained on 18 trillion high-quality tokens, the model demonstrates robust performance with large datasets, expert-level knowledge in specialized domains, superior abstract reasoning capabilities, and the ability to solve problems requiring deep contextual understanding and multi-step analysis.
Qwen2.5-32B is designed for organizations and research teams that need frontier-model capabilities without the full cost of the largest models. Ideal applications include scientific research, complex software development, high-quality content creation, expert support systems in medicine and law, and as a foundation for building highly specialized AI systems.
| Model Name | Context | Type | GPU | Status | Link |
|---|
There are no public endpoints for this model yet.
Rent your own physically dedicated instance with hourly or long-term monthly billing.
We recommend deploying private instances in the following scenarios:
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
32,768.0 pipeline |
3 | $0.88 | 2.061 | Launch | ||
32,768.0 tensor |
2 | $0.93 | 2.374 | Launch | ||
32,768.0 tensor |
4 | $0.96 | 3.549 | Launch | ||
32,768.0 pipeline |
3 | $1.06 | 2.061 | Launch | ||
32,768.0 tensor |
4 | $1.12 | 1.299 | Launch | ||
32,768.0 tensor |
2 | $1.23 | 2.374 | Launch | ||
32,768.0 tensor |
4 | $1.26 | 3.549 | Launch | ||
32,768.0 tensor |
2 | $1.56 | 2.374 | Launch | ||
32,768.0 tensor |
2 | $1.92 | 2.374 | Launch | ||
32,768.0 tensor |
2 | $2.22 | 4.174 | Launch | ||
32,768.0 |
1 | $2.37 | 6.286 | Launch | ||
32,768.0 tensor |
2 | $2.93 | 4.174 | Launch | ||
32,768.0 |
1 | $3.83 | 6.286 | Launch | ||
32,768.0 |
1 | $4.11 | 7.861 | Launch | ||
32,768.0 |
1 | $4.74 | 13.149 | Launch | ||
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
32,768.0 tensor |
2 | $0.93 | 1.050 | Launch | ||
32,768.0 tensor |
4 | $0.96 | 2.225 | Launch | ||
32,768.0 tensor |
2 | $1.23 | 1.050 | Launch | ||
32,768.0 tensor |
4 | $1.26 | 2.225 | Launch | ||
32,768.0 tensor |
2 | $1.56 | 1.050 | Launch | ||
32,768.0 tensor |
2 | $1.92 | 1.050 | Launch | ||
32,768.0 tensor |
2 | $2.22 | 2.850 | Launch | ||
32,768.0 |
1 | $2.37 | 4.962 | Launch | ||
32,768.0 tensor |
2 | $2.93 | 2.850 | Launch | ||
32,768.0 |
1 | $3.83 | 4.962 | Launch | ||
32,768.0 |
1 | $4.11 | 6.537 | Launch | ||
32,768.0 |
1 | $4.74 | 11.825 | Launch | ||
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
32,768.0 tensor |
4 | $1.75 | 1.360 | Launch | ||
32,768.0 tensor |
4 | $2.34 | 1.360 | Launch | ||
32,768.0 tensor |
4 | $2.97 | 1.360 | Launch | ||
32,768.0 tensor |
4 | $3.68 | 1.360 | Launch | ||
32,768.0 pipeline |
3 | $3.89 | 1.672 | Launch | ||
32,768.0 |
1 | $4.11 | 2.072 | Launch | ||
32,768.0 pipeline |
3 | $4.34 | 1.672 | Launch | ||
32,768.0 tensor |
4 | $4.35 | 4.960 | Launch | ||
32,768.0 tensor |
2 | $4.61 | 9.185 | Launch | ||
32,768.0 |
1 | $4.74 | 7.360 | Launch | ||
32,768.0 tensor |
4 | $5.74 | 4.960 | Launch | ||
32,768.0 tensor |
2 | $7.84 | 9.185 | Launch | ||
Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.