Qwen3-235B-A22B-Instruct-2507

Qwen3-235B-A22B-Instruct-2507 is an updated version of the flagship MoE model in the Qwen 3 series. This 235-billion-parameter model activates only 22 billion parameters at each inference step. The architecture consists of 94 transformer layers with 128 experts, of which only 8 are activated per token.

Unlike previous versions of Qwen, the new 2507 model has completely abandoned the hybrid thinking mode in favor of a highly optimized non-thinking mode. This decision was made based on user feedback indicating a preference for faster responses without the generation of <think> blocks. As a result, response speed has dramatically increased, along with significant improvements in output quality. In mathematical benchmarks, the model achieved remarkable gains: AIME25 (70.3 vs. 24.7 in the previous version) and HMMT25 (55.4 vs. 10.0). Particularly impressive is its performance on ZebraLogic (95.0), demonstrating near-perfect accuracy in logical reasoning tasks. In programming, the model also significantly outperforms its predecessor, achieving state-of-the-art results on LiveCodeBench and MultiPL-E. Overall, across numerous benchmarks, the model surpasses leading competitors such as GPT-4o, DeepSeek-V3, and Kimi K2.

Additionally, developers have released Qwen3-235B-A22B-Instruct-2507-FP8—an FP8 quantized version of the model. This innovative technique reduces memory requirements by approximately 50% while preserving nearly all of the original model's quality. The FP8 format outperforms traditional INT8 approaches, especially for large-scale models, offering a superior balance between accuracy and efficiency.

Another key technological advancement of Qwen3-235B-A22B-Instruct-2507 is native support for a context length of 262,144 tokens. This capability enables entirely new use cases—from analyzing lengthy documents and codebases to conducting multi-hour conversations while maintaining contextual understanding and high response accuracy even with a fully filled context window. Therefore, there are strong grounds to believe that the new Qwen3-235B-A22B-Instruct-2507 model positions itself as the leading open-source solution for a broad range of enterprise applications.


Announce Date: 21.07.2025
Parameters: 235B
Experts: 128
Activated at inference: 22B
Context: 263K
Layers: 94
Attention Type: Full or Sliding Window Attention
VRAM requirements: 172.7 GB using 4 bits quantization
Developer: Qwen
Transformers Version: 4.51.0
License: Apache 2.0

Public endpoint

Use our pre-built public endpoints for free to test inference and explore Qwen3-235B-A22B-Instruct-2507 capabilities. You can obtain an API access token on the token management page after registration and verification.
Model Name Context Type GPU TPS Status Link
There are no public endpoints for this model yet.

Private server

Rent your own physically dedicated instance with hourly or long-term monthly billing.

We recommend deploying private instances in the following scenarios:

  • maximize endpoint performance,
  • enable full context for long sequences,
  • ensure top-tier security for data processing in an isolated, dedicated environment,
  • use custom weights, such as fine-tuned models or LoRA adapters.

Recommended configurations for hosting Qwen3-235B-A22B-Instruct-2507

Prices:
Name vCPU RAM, MB Disk, GB GPU Price, hour
teslaa100-3.32.384.240
262,144.0
pipeline
32 393216 240 3 $7.36 Launch
teslaa100-4.16.256.240
262,144.0
tensor
16 262144 240 4 $9.14 Launch
h200-2.24.256.240
262,144.0
tensor
24 262144 240 2 $9.41 Launch
rtx5090-8.44.256.240
262,144.0
tensor
44 262144 240 8 $11.55 Launch
teslah100-3.32.384.240
262,144.0
pipeline
32 393216 240 3 $11.73 Launch
teslah100-4.16.256.240
262,144.0
tensor
16 262144 240 4 $14.96 Launch
Prices:
Name vCPU RAM, MB Disk, GB GPU Price, hour
teslaa100-6.44.512.320.nvlink
262,144.0
pipeline
44 524288 320 6 $14.08 Launch
h200-3.32.512.480
262,144.0
pipeline
32 524288 480 3 $14.36 Launch
teslaa100-8.44.512.320.nvlink
262,144.0
tensor
44 524288 320 8 $18.33 Launch
h200-4.32.768.480
262,144.0
tensor
32 786432 480 4 $19.23 Launch
Prices:
Name vCPU RAM, MB Disk, GB GPU Price, hour
teslaa100-8.44.704.960.nvlink
262,144.0
tensor
44 720896 960 8 $18.78 Launch
h200-6.52.896.640
262,144.0
pipeline
52 917504 640 6 $28.36 Launch
h200-8.52.1024.640
262,144.0
tensor
52 1048576 640 8 $37.34 Launch

Related models

Need help?

Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.