Qwen3-4B-Thinking-2507

reasoning

Qwen3-4B-Thinking-2507 is an enhanced version of Qwen3-4B. Built on the same base architecture with 4 billion parameters and 36 layers, featuring Group Query Attention (GQA) with 32 heads for queries and 8 for keys/values, it is fundamentally differentiated by specialized training for deep question analysis and multi-step problem solving. The model features extended reasoning length, enabling thorough examination of every aspect of a task before formulating the final answer, along with native support for a 262K-token context. It automatically generates a visible reasoning process within <think></think> blocks, allowing users to track the solution logic while significantly improving the model's inference quality on complex tasks.

The model delivers exceptional performance in tasks requiring deep analysis. On the AIME25 math olympiad benchmark, it achieves a score of 81.3—15.7 points higher than the base version. On HMMT25 (Harvard-MIT math competitions), it scores 55.5, outperforming the base model by 13.4 points. In academic tests at the PhD level, the model achieves results remarkable for a 4-billion-parameter model: GPQA (65.8) and SuperGPQA (47.8). In agent-based tasks, it surpasses many specialized models: BFCL-v3 (71.2), TAU1-Retail (66.1), TAU2-Retail (53.5), confirming its strength in complex, multi-step planning.

Qwen3-4B-Thinking-2507 is ideal for everyday tasks—simple yet requiring thoughtful processing—such as literature review preparation, drafting academic paper templates, and analyzing trends in statistical data. It is also highly effective in solving more complex technical challenges, including software debugging and architectural design, as well as in educational applications such as creating teaching materials and automated grading systems.


Announce Date: 07.08.2025
Parameters: 4.02B
Context: 263K
Attention Type: Full or Sliding Window Attention
VRAM requirements: 36.2 GB using 4 bits quantization
Developer: Alibaba
Transformers Version: 4.51.0
License: Apache 2.0

Public endpoint

Use our pre-built public endpoints to test inference and explore Qwen3-4B-Thinking-2507 capabilities.
Model Name Context Type GPU TPS Status Link
There are no public endpoints for this model yet.

Private server

Rent your own physically dedicated instance with hourly or long-term monthly billing.

We recommend deploying private instances in the following scenarios:

  • maximize endpoint performance,
  • enable full context for long sequences,
  • ensure top-tier security for data processing in an isolated, dedicated environment,
  • use custom weights, such as fine-tuned models or LoRA adapters.

Recommended configurations for hosting Qwen3-4B-Thinking-2507

Prices:
Name vCPU RAM, MB Disk, GB GPU Price, hour
teslaa10-2.16.64.160 16 65536 160 2 $0.93 Launch
rtx2080ti-4.16.64.160 16 65536 160 4 $1.18 Launch
teslat4-4.16.64.160 16 65536 160 4 $1.48 Launch
rtx3090-2.16.64.160 16 65536 160 2 $1.67 Launch
rtx3080-4.16.64.160 16 65536 160 4 $1.82 Launch
rtx4090-2.16.64.160 16 65536 160 2 $2.19 Launch
teslaa100-1.16.64.160 16 65536 160 1 $2.58 Launch
rtx5090-2.16.64.160 16 65536 160 2 $2.93 Launch
teslah100-1.16.64.160 16 65536 160 1 $5.11 Launch
Prices:
Name vCPU RAM, MB Disk, GB GPU Price, hour
teslaa10-2.16.64.160 16 65536 160 2 $0.93 Launch
rtx2080ti-4.16.64.160 16 65536 160 4 $1.18 Launch
teslat4-4.16.64.160 16 65536 160 4 $1.48 Launch
rtx3090-2.16.64.160 16 65536 160 2 $1.67 Launch
rtx4090-2.16.64.160 16 65536 160 2 $2.19 Launch
teslaa100-1.16.64.160 16 65536 160 1 $2.58 Launch
rtx5090-2.16.64.160 16 65536 160 2 $2.93 Launch
teslah100-1.16.64.160 16 65536 160 1 $5.11 Launch
Prices:
Name vCPU RAM, MB Disk, GB GPU Price, hour
teslaa10-2.16.64.160 16 65536 160 2 $0.93 Launch
teslat4-4.16.64.160 16 65536 160 4 $1.48 Launch
rtx3090-2.16.64.160 16 65536 160 2 $1.67 Launch
rtx4090-2.16.64.160 16 65536 160 2 $2.19 Launch
teslaa100-1.16.64.160 16 65536 160 1 $2.58 Launch
rtx5090-2.16.64.160 16 65536 160 2 $2.93 Launch
teslah100-1.16.64.160 16 65536 160 1 $5.11 Launch

Related models

Need help?

Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.