Qwen3-Next-80B-A3B-Instruct

Qwen3-Next-80B-A3B-Instruct is the first model based on the innovative Qwen3-Next architecture, which incorporates a series of technological breakthroughs developed by its creators. At its core lies a hybrid attention system that combines two mechanisms in a 3:1 ratio: Gated DeltaNet (used in 75% of layers) enables linear computational complexity and highly efficient processing of long sequences, while Gated Attention (in the remaining 25% of layers) ensures high accuracy and strong information retrieval capabilities. This architecture addresses a fundamental limitation of traditional attention—linear attention is fast but weak in retrieval tasks, whereas standard attention is computationally expensive and slow during inference. Their hybrid combination demonstrates superior learning and contextual understanding abilities compared to alternative approaches such as Sliding Window Attention or Mamba2.

The model also implements an ultra-sparse Mixture-of-Experts (MoE) architecture with 512 experts, of which only 10 routed experts plus 1 shared expert are activated—just 3.7% of the total parameter count. Compared to the MoE structure in Qwen3 (128 experts, 8 active), this represents a significant advancement in efficiency and scalability. Qwen3-Next introduces several critical optimizations to ensure training stability and high performance: Zero-Centered RMSNorm replaces conventional QK-Norm, an Attention Output Gating mechanism eliminates issues like Attention Sink and Massive Activation, and Multi-Token Prediction (MTP) enhances contextual coherence, generation speed, and overall performance.

Qwen3-Next-80B-A3B-Instruct achieves impressive results on key benchmarks, nearly matching the performance of the flagship Qwen3-235B-A22B-Instruct-2507 model despite significantly lower computational costs. On Arena-Hard v2, it scores 82.7 points, outperforming many competing models. In programming, it achieves a solid 56.6 on LiveCodeBench v6, surpassing even some larger models. On the AIME25 math benchmark, it reaches 69.5 points, demonstrating strong capabilities in complex reasoning.

Thanks to its unique architectural innovations and high efficiency, Qwen3-Next-80B-A3B-Instruct is ideally suited for a wide range of applications, including processing extremely long documents, software development and programming, agent-based systems, business process automation—and many more use cases beyond this list.


Announce Date: 11.09.2025
Parameters: 81.3B
Experts: 512
Activated: 3B
Context: 263K
VRAM requirements: 37.9 GB using 4 bits quantization
Developer: Qwen
Transformers Version: 4.57.0.dev0
License: Apache 2.0

Public endpoint

Use our pre-built public endpoints for free to test inference and explore Qwen3-Next-80B-A3B-Instruct capabilities. You can obtain an API access token on the token management page after registration and verification.
Model Name Context Type GPU TPS Status Link
There are no public endpoints for this model yet.

Private server

Rent your own physically dedicated instance with hourly or long-term monthly billing.

We recommend deploying private instances in the following scenarios:

  • maximize endpoint performance,
  • enable full context for long sequences,
  • ensure top-tier security for data processing in an isolated, dedicated environment,
  • use custom weights, such as fine-tuned models or LoRA adapters.

Recommended configurations for hosting Qwen3-Next-80B-A3B-Instruct

Prices:
Name vCPU RAM, MB Disk, GB GPU Price, hour
teslaa10-2.16.64.160
262,144.0
16 65536 160 2 $0.93 Launch
teslat4-4.16.64.160
262,144.0
16 65536 160 4 $0.96 Launch
rtx2080ti-4.16.64.160
262,144.0
16 65536 160 4 $1.18 Launch
rtx3090-2.16.64.160
262,144.0
16 65536 160 2 $1.67 Launch
rtx3080-4.16.64.160
262,144.0
16 65536 160 4 $1.82 Launch
rtx4090-2.16.64.160
262,144.0
16 65536 160 2 $2.19 Launch
teslav100-2.16.64.240
262,144.0
16 65535 240 2 $2.22 Launch
teslaa100-1.16.64.160
262,144.0
16 65536 160 1 $2.58 Launch
rtx5090-2.16.64.160
262,144.0
16 65536 160 2 $2.93 Launch
teslah100-1.16.64.160
262,144.0
16 65536 160 1 $5.11 Launch
Prices:
Name vCPU RAM, MB Disk, GB GPU Price, hour
teslaa10-4.16.128.160
262,144.0
16 131072 160 4 $1.75 Launch
teslaa100-1.16.128.160
262,144.0
16 131072 160 1 $2.71 Launch
rtx3090-4.16.128.160
262,144.0
16 131072 160 4 $3.23 Launch
rtx4090-4.16.128.160
262,144.0
16 131072 160 4 $4.26 Launch
rtx5090-3.16.96.160
262,144.0
16 98304 160 3 $4.34 Launch
teslah100-1.16.128.160
262,144.0
16 131072 160 1 $5.23 Launch
Prices:
Name vCPU RAM, MB Disk, GB GPU Price, hour
teslaa100-2.24.256.240
262,144.0
24 262144 240 2 $5.36 Launch
teslah100-2.24.256.240
262,144.0
24 262144 240 2 $10.41 Launch

Related models

Need help?

Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.