Qwen3-235B-A22B-Instruct-2507 is an updated version of the flagship MoE model in the Qwen 3 series. This 235-billion-parameter model activates only 22 billion parameters at each inference step. The architecture consists of 94 transformer layers with 128 experts, of which only 8 are activated per token.
Unlike previous versions of Qwen, the new 2507 model has completely abandoned the hybrid thinking mode in favor of a highly optimized non-thinking mode. This decision was made based on user feedback indicating a preference for faster responses without the generation of <think> blocks. As a result, response speed has dramatically increased, along with significant improvements in output quality. In mathematical benchmarks, the model achieved remarkable gains: AIME25 (70.3 vs. 24.7 in the previous version) and HMMT25 (55.4 vs. 10.0). Particularly impressive is its performance on ZebraLogic (95.0), demonstrating near-perfect accuracy in logical reasoning tasks. In programming, the model also significantly outperforms its predecessor, achieving state-of-the-art results on LiveCodeBench and MultiPL-E. Overall, across numerous benchmarks, the model surpasses leading competitors such as GPT-4o, DeepSeek-V3, and Kimi K2.
Additionally, developers have released Qwen3-235B-A22B-Instruct-2507-FP8—an FP8 quantized version of the model. This innovative technique reduces memory requirements by approximately 50% while preserving nearly all of the original model's quality. The FP8 format outperforms traditional INT8 approaches, especially for large-scale models, offering a superior balance between accuracy and efficiency.
Another key technological advancement of Qwen3-235B-A22B-Instruct-2507 is native support for a context length of 262,144 tokens. This capability enables entirely new use cases—from analyzing lengthy documents and codebases to conducting multi-hour conversations while maintaining contextual understanding and high response accuracy even with a fully filled context window. Therefore, there are strong grounds to believe that the new Qwen3-235B-A22B-Instruct-2507 model positions itself as the leading open-source solution for a broad range of enterprise applications.
Model Name | Context | Type | GPU | TPS | Status | Link |
---|---|---|---|---|---|---|
chriswritescode/Qwen3-235B-A22B-Instruct-2507-INT4-W4A16 | 125,600.0 | Public | 2×TeslaH100 | 60.24 | AVAILABLE | try |
curl https://chat.immers.cloud/v1/endpoints/Qwen3-235B-A22B-Instruct-2507-optimized/generate/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer USER_API_KEY" \
-d '{"model": "Qwen-3-235B-A22B-Instruct", "messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Say this is a test"}
], "temperature": 0, "max_tokens": 150}'
$response = Invoke-WebRequest https://chat.immers.cloud/v1/endpoints/Qwen3-235B-A22B-Instruct-2507-optimized/generate/chat/completions `
-Method POST `
-Headers @{
"Authorization" = "Bearer USER_API_KEY"
"Content-Type" = "application/json"
} `
-Body (@{
model = "Qwen-3-235B-A22B-Instruct"
messages = @(
@{ role = "system"; content = "You are a helpful assistant." },
@{ role = "user"; content = "Say this is a test" }
)
} | ConvertTo-Json)
($response.Content | ConvertFrom-Json).choices[0].message.content
#!pip install OpenAI --upgrade
from openai import OpenAI
client = OpenAI(
api_key="USER_API_KEY",
base_url="https://chat.immers.cloud/v1/endpoints/Qwen3-235B-A22B-Instruct-2507-optimized/generate/",
)
chat_response = client.chat.completions.create(
model="Qwen-3-235B-A22B-Instruct",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Say this is a test"},
]
)
print(chat_response.choices[0].message.content)
Rent your own physically dedicated instance with hourly or long-term monthly billing.
We recommend deploying private instances in the following scenarios:
Name | vCPU | RAM, MB | Disk, GB | GPU | |||
---|---|---|---|---|---|---|---|
32 | 393216 | 240 | 3 | $8.00 | Launch | ||
44 | 262144 | 240 | 8 | $8.59 | Launch | ||
44 | 262144 | 240 | 6 | $8.86 | Launch | ||
32 | 393216 | 240 | 3 | $15.58 | Launch |
Name | vCPU | RAM, MB | Disk, GB | GPU | |||
---|---|---|---|---|---|---|---|
44 | 524288 | 320 | 4 | $10.68 | Launch | ||
44 | 524288 | 320 | 4 | $20.77 | Launch |
Name | vCPU | RAM, MB | Disk, GB | GPU |
---|
Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.