Qwen3-Next-80B-A3B-Instruct is the first model based on the innovative Qwen3-Next architecture, which incorporates a series of technological breakthroughs developed by its creators. At its core lies a hybrid attention system that combines two mechanisms in a 3:1 ratio: Gated DeltaNet (used in 75% of layers) enables linear computational complexity and highly efficient processing of long sequences, while Gated Attention (in the remaining 25% of layers) ensures high accuracy and strong information retrieval capabilities. This architecture addresses a fundamental limitation of traditional attention—linear attention is fast but weak in retrieval tasks, whereas standard attention is computationally expensive and slow during inference. Their hybrid combination demonstrates superior learning and contextual understanding abilities compared to alternative approaches such as Sliding Window Attention or Mamba2.
The model also implements an ultra-sparse Mixture-of-Experts (MoE) architecture with 512 experts, of which only 10 routed experts plus 1 shared expert are activated—just 3.7% of the total parameter count. Compared to the MoE structure in Qwen3 (128 experts, 8 active), this represents a significant advancement in efficiency and scalability. Qwen3-Next introduces several critical optimizations to ensure training stability and high performance: Zero-Centered RMSNorm replaces conventional QK-Norm, an Attention Output Gating mechanism eliminates issues like Attention Sink and Massive Activation, and Multi-Token Prediction (MTP) enhances contextual coherence, generation speed, and overall performance.
Qwen3-Next-80B-A3B-Instruct achieves impressive results on key benchmarks, nearly matching the performance of the flagship Qwen3-235B-A22B-Instruct-2507 model despite significantly lower computational costs. On Arena-Hard v2, it scores 82.7 points, outperforming many competing models. In programming, it achieves a solid 56.6 on LiveCodeBench v6, surpassing even some larger models. On the AIME25 math benchmark, it reaches 69.5 points, demonstrating strong capabilities in complex reasoning.
Thanks to its unique architectural innovations and high efficiency, Qwen3-Next-80B-A3B-Instruct is ideally suited for a wide range of applications, including processing extremely long documents, software development and programming, agent-based systems, business process automation—and many more use cases beyond this list.
Model Name | Context | Type | GPU | TPS | Status | Link |
---|
There are no public endpoints for this model yet.
Rent your own physically dedicated instance with hourly or long-term monthly billing.
We recommend deploying private instances in the following scenarios:
Name | vCPU | RAM, MB | Disk, GB | GPU | |||
---|---|---|---|---|---|---|---|
262,144.0 |
16 | 65536 | 160 | 2 | $0.93 | Launch | |
262,144.0 |
16 | 65536 | 160 | 4 | $0.96 | Launch | |
262,144.0 |
16 | 65536 | 160 | 4 | $1.18 | Launch | |
262,144.0 |
16 | 65536 | 160 | 2 | $1.67 | Launch | |
262,144.0 |
16 | 65536 | 160 | 4 | $1.82 | Launch | |
262,144.0 |
16 | 65536 | 160 | 2 | $2.19 | Launch | |
262,144.0 |
16 | 65535 | 240 | 2 | $2.22 | Launch | |
262,144.0 |
16 | 65536 | 160 | 1 | $2.58 | Launch | |
262,144.0 |
16 | 65536 | 160 | 2 | $2.93 | Launch | |
262,144.0 |
16 | 65536 | 160 | 1 | $5.11 | Launch |
Name | vCPU | RAM, MB | Disk, GB | GPU | |||
---|---|---|---|---|---|---|---|
262,144.0 |
16 | 131072 | 160 | 4 | $1.75 | Launch | |
262,144.0 |
16 | 131072 | 160 | 1 | $2.71 | Launch | |
262,144.0 |
16 | 131072 | 160 | 4 | $3.23 | Launch | |
262,144.0 |
16 | 131072 | 160 | 4 | $4.26 | Launch | |
262,144.0 |
16 | 98304 | 160 | 3 | $4.34 | Launch | |
262,144.0 |
16 | 131072 | 160 | 1 | $5.23 | Launch |
Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.