Qwen2.5-32B features 32 billion parameters, 64 layers, and a 40/8 attention head architecture, representing a significant leap in computational power and model capabilities. With support for a 128K-token context window and 8K-token generation capacity, the model can handle exceptionally complex and large-scale tasks.
Qwen2.5-32B reintroduces the 32B parameter size to the Qwen series after its absence in Qwen2, offering users a powerful alternative to the flagship 72B model with lower resource requirements. Trained on 18 trillion high-quality tokens, the model demonstrates robust performance with large datasets, expert-level knowledge in specialized domains, superior abstract reasoning capabilities, and the ability to solve problems requiring deep contextual understanding and multi-step analysis.
Qwen2.5-32B is designed for organizations and research teams that need frontier-model capabilities without the full cost of the largest models. Ideal applications include scientific research, complex software development, high-quality content creation, expert support systems in medicine and law, and as a foundation for building highly specialized AI systems.
| Model Name | Context | Type | GPU | Status | Link |
|---|
There are no public endpoints for this model yet.
Rent your own physically dedicated instance with hourly or long-term monthly billing.
We recommend deploying private instances in the following scenarios:
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
32,768.0 pipeline |
3 | $0.88 | 1.779 | Launch | ||
32,768.0 tensor |
2 | $0.93 | 2.313 | Launch | ||
32,768.0 tensor |
4 | $0.96 | 3.211 | Launch | ||
32,768.0 pipeline |
3 | $1.06 | 1.793 | Launch | ||
32,768.0 tensor |
4 | $1.12 | 1.371 | Launch | ||
32,768.0 tensor |
2 | $1.23 | 2.313 | Launch | ||
32,768.0 tensor |
4 | $1.26 | 3.229 | Launch | ||
32,768.0 tensor |
2 | $1.56 | 2.557 | Launch | ||
32,768.0 |
1 | $1.59 | 1.056 | Launch | ||
32,768.0 tensor |
2 | $1.92 | 2.548 | Launch | ||
32,768.0 |
1 | $2.37 | 6.564 | Launch | ||
32,768.0 |
1 | $3.83 | 6.556 | Launch | ||
32,768.0 |
1 | $4.11 | 8.156 | Launch | ||
32,768.0 tensor |
2 | $4.61 | 15.366 | Launch | ||
32,768.0 |
1 | $4.74 | 13.528 | Launch | ||
32,768.0 tensor |
2 | $9.40 | 29.292 | Launch | ||
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
32,768.0 tensor |
4 | $0.96 | 1.722 | Launch | ||
32,768.0 tensor |
4 | $1.26 | 1.741 | Launch | ||
32,768.0 pipeline |
3 | $1.34 | 2.983 | Launch | ||
32,768.0 tensor |
2 | $1.56 | 1.069 | Launch | ||
32,768.0 tensor |
4 | $1.57 | 5.375 | Launch | ||
32,768.0 tensor |
2 | $1.92 | 1.059 | Launch | ||
32,768.0 tensor |
4 | $2.34 | 5.375 | Launch | ||
32,768.0 |
1 | $2.37 | 5.076 | Launch | ||
32,768.0 tensor |
2 | $2.93 | 2.860 | Launch | ||
32,768.0 |
1 | $3.83 | 5.068 | Launch | ||
32,768.0 |
1 | $4.11 | 6.668 | Launch | ||
32,768.0 tensor |
2 | $4.61 | 13.877 | Launch | ||
32,768.0 |
1 | $4.74 | 12.039 | Launch | ||
32,768.0 tensor |
2 | $9.40 | 27.804 | Launch | ||
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
32,768.0 tensor |
4 | $1.76 | 1.472 | Launch | ||
32,768.0 |
1 | $2.51 | 1.173 | Launch | ||
32,768.0 tensor |
4 | $2.97 | 1.959 | Launch | ||
32,768.0 tensor |
4 | $3.68 | 1.941 | Launch | ||
32,768.0 |
1 | $3.96 | 1.165 | Launch | ||
32,768.0 |
1 | $4.12 | 2.764 | Launch | ||
32,768.0 pipeline |
3 | $4.35 | 2.011 | Launch | ||
32,768.0 |
1 | $4.74 | 8.136 | Launch | ||
32,768.0 tensor |
2 | $4.94 | 9.974 | Launch | ||
32,768.0 tensor |
4 | $5.76 | 5.543 | Launch | ||
32,768.0 tensor |
2 | $9.41 | 23.901 | Launch | ||
Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.