Qwen3-30B-A3B-Thinking-2507

reasoning

Qwen3-30B-A3B-Thinking-2507 is an upgraded hybrid version of the Qwen3-30B-A3B model, specifically optimized for reasoning-only mode, significantly enhancing its reasoning capabilities. Built on a Mixture of Experts (MoE) architecture, the model has 30.5 billion total parameters, with only 3.3 billion activated per inference. Out of 128 experts, just 8 are activated per task, enabling dynamic adaptation to diverse query types. The model features 48 hidden layers and employs Group Query Attention (32 query heads and 4 key-value heads), ensuring efficient information processing while maintaining high-quality attention mechanisms. Architectural innovations also include native support for an extended context length of up to 262,144 tokens, making the model ideal for analyzing large documents, complex codebases, and performing multi-step reasoning.

The advanced reasoning mode enables Qwen3-30B-A3B-Thinking-2507 to achieve outstanding results on the AIME25 math benchmark (85.0), surpassing the closely sized proprietary model Gemini 2.5-Flash-Thinking (72.0). The model also excels in agent-like use cases, scoring 72.4 on the BFCL-v3 benchmark, making it an excellent choice for integration with external tools to automate complex workflows. Notably, for highly complex tasks, developers are recommended to use an output length of up to 81,920 tokens, allowing the model to fully leverage its potential in step-by-step reasoning. For routine tasks, a standard output length of 32,768 tokens is sufficient.

In summary, Qwen3-30B-A3B-Thinking-2507 is a versatile solution for large industrial enterprises, research centers, and educational institutions—where high-level analytical reasoning is required, and a mid-sized yet powerful model is preferred.


Announce Date: 29.07.2025
Parameters: 31B
Experts: 128
Activated at inference: 4B
Context: 263K
Layers: 48
Attention Type: Full or Sliding Window Attention
Developer: Qwen
Transformers Version: 4.51.0
License: Apache 2.0

Public endpoint

Use our pre-built public endpoints for free to test inference and explore Qwen3-30B-A3B-Thinking-2507 capabilities. You can obtain an API access token on the token management page after registration and verification.
Model Name Context Type GPU Status Link
There are no public endpoints for this model yet.

Private server

Rent your own physically dedicated instance with hourly or long-term monthly billing.

We recommend deploying private instances in the following scenarios:

  • maximize endpoint performance,
  • enable full context for long sequences,
  • ensure top-tier security for data processing in an isolated, dedicated environment,
  • use custom weights, such as fine-tuned models or LoRA adapters.

Recommended server configurations for hosting Qwen3-30B-A3B-Thinking-2507

Prices:
Name GPU Price, hour TPS Max Concurrency
teslaa10-2.16.64.160
192,000.0
tensor
2 $0.93 1.180 Launch
teslat4-4.16.64.160
262,144.0
tensor
4 $0.96 1.164 Launch
rtxa5000-2.16.64.160.nvlink
192,000.0
tensor
2 $1.23 1.180 Launch
teslaa2-4.32.128.160
262,144.0
tensor
4 $1.26 1.170 Launch
teslaa10-3.16.96.160
262,144.0
pipeline
3 $1.34 1.623 Launch
rtx3090-2.16.64.160
192,000.0
tensor
2 $1.56 1.291 Launch
teslaa10-4.12.48.160
262,144.0
tensor
4 $1.57 2.381 Launch
rtx4090-2.16.64.160
192,000.0
tensor
2 $1.92 1.287 Launch
rtx3090-3.16.96.160
262,144.0
pipeline
3 $2.29 1.745 Launch
rtxa5000-4.16.128.160.nvlink
262,144.0
tensor
4 $2.34 2.381 Launch
teslaa100-1.16.64.160
262,144.0
1 $2.37 2.281 Launch
rtx4090-3.16.96.160
262,144.0
pipeline
3 $2.83 1.740 Launch
rtx3090-4.16.64.160
262,144.0
tensor
4 $2.89 2.544 Launch
rtx5090-2.16.64.160
262,144.0
tensor
2 $2.93 1.543 Launch
rtx4090-4.16.64.160
262,144.0
tensor
4 $3.60 2.537 Launch
h100-1.16.64.160
262,144.0
1 $3.83 162.730 2.279 Launch
h100nvl-1.16.96.160
262,144.0
1 $4.11 2.812 Launch
teslaa100-2.24.96.160.nvlink
262,144.0
tensor
2 $4.61 5.215 Launch
h200-1.16.128.160
262,144.0
1 $4.74 4.603 Launch
h200-2.24.256.160.nvlink
262,144.0
tensor
2 $9.40 9.857 Launch
Prices:
Name GPU Price, hour TPS Max Concurrency
teslaa10-3.16.96.160
262,144.0
pipeline
3 $1.34 1.065 Launch
teslaa10-4.12.48.160
192,000.0
tensor
4 $1.57 2.490 Launch
teslaa10-4.16.64.160
262,144.0
tensor
4 $1.62 1.824 Launch
teslaa2-6.32.128.160
262,144.0
pipeline
6 $1.65 1.015 Launch
rtx3090-3.16.96.160
262,144.0
pipeline
3 $2.29 1.187 Launch
rtxa5000-4.16.128.160.nvlink
262,144.0
tensor
4 $2.34 1.824 Launch
teslaa100-1.16.64.160
262,144.0
1 $2.37 1.724 Launch
rtx4090-3.16.96.160
262,144.0
pipeline
3 $2.83 1.183 Launch
rtx3090-4.16.64.160
262,144.0
tensor
4 $2.89 1.986 Launch
rtx5090-2.16.64.160
192,000.0
tensor
2 $2.93 1.345 Launch
rtx4090-4.16.64.160
262,144.0
tensor
4 $3.60 1.980 Launch
h100-1.16.64.160
262,144.0
1 $3.83 1.721 Launch
h100nvl-1.16.96.160
262,144.0
1 $4.11 2.255 Launch
rtx5090-3.16.96.160
262,144.0
pipeline
3 $4.34 2.083 Launch
teslaa100-2.24.96.160.nvlink
262,144.0
tensor
2 $4.61 4.658 Launch
h200-1.16.128.160
262,144.0
1 $4.74 4.045 Launch
rtx5090-4.16.128.160
262,144.0
tensor
4 $5.74 3.181 Launch
h200-2.24.256.160.nvlink
262,144.0
tensor
2 $9.40 9.300 Launch
Prices:
Name GPU Price, hour TPS Max Concurrency
dedicated-rtx3090-8.64.128.960-1
262,144.0
tensor
8 2.011 Launch
rtx3090-4.16.96.320
192,000.0
tensor
4 $2.97 1.128 Launch
rtxa5000-6.24.192.160.nvlink
262,144.0
pipeline
6 $3.50 1.454 Launch
rtx4090-4.16.96.320
192,000.0
tensor
4 $3.68 1.120 Launch
h100nvl-1.16.96.160
262,144.0
1 $4.11 1.095 Launch
rtx5090-3.16.96.160
192,000.0
pipeline
3 $4.34 1.260 Launch
teslaa100-2.24.96.160.nvlink
262,144.0
tensor
2 $4.61 3.498 Launch
rtxa5000-8.24.256.160.nvlink
262,144.0
tensor
8 $4.61 1.848 Launch
h200-1.16.128.160
262,144.0
1 $4.74 2.885 Launch
teslaa100-2.24.256.160
262,144.0
tensor
2 $4.93 3.498 Launch
rtx5090-4.16.128.160
262,144.0
tensor
4 $5.74 2.021 Launch
rtx4090-6.44.256.160
262,144.0
pipeline
6 $5.83 1.610 Launch
rtx4090-8.44.256.160
262,144.0
tensor
8 $7.51 2.005 Launch
h100-2.24.256.160
262,144.0
tensor
2 $7.84 3.492 Launch
h200-2.24.256.160.nvlink
262,144.0
tensor
2 $9.40 8.140 Launch

Related models

Need help?

Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.