Qwen3.6-27B

reasoning
multimodal
coding

Qwen3.6-27B is an open 27B model from the Qwen3.6 family, released as a dense model — meaning without MoE routing. Despite this, it remains natively multimodal: it handles text, images, and video, supports reasoning in thinking mode and direct answers in non-thinking mode. The base version is published in BF16, and there is also an official FP8-quantized version with metrics nearly identical to the base model.

The main architectural feature is a hybrid attention scheme: 16 × (3 × Gated DeltaNet → FFN + 1 × Gated Attention → FFN), meaning three-quarters of the blocks use Gated DeltaNet, and every fourth block uses Gated Attention. Gated DeltaNet can be understood as a more efficient linear attention mechanism: it does not recompute all pairwise token relationships like classic attention but updates a compact state and uses gating to decide which information to retain or pass forward. Gated Attention, on the other hand, retains more precise standard attention in some layers: it is useful for explicitly extracting details from context, while gating helps filter and stabilize the output. As a result, the model combines the long-context efficiency of DeltaNet with the precision of classic attention.

The model is trained with Multi-Token Prediction (MTP) and boasts a native context window of 262,144 tokens, extendable to 1,010,000 tokens via RoPE/YaRN scaling. The developers specifically warn that if you run out of memory you can reduce the context, but for complex reasoning tasks it is advisable to keep it at least 128K tokens, because long context directly contributes to the quality of reasoning. Another important feature is preserve_thinking: the model can retain the reasoning context of past messages, which is especially useful for multi-step agents where it is important not to start analysis from scratch on every turn. For production, the developers recommend SGLang, vLLM, or KTransformers; for generation in thinking mode — temperature 1.0, top_p 0.95, top_k 20; for precise coding/WebDev — temperature 0.6, top_p 0.95, top_k 20; and for non-thinking mode — temperature 0.7, top_p 0.80, and presence_penalty 1.5.

The main difference between Qwen3.6-27B and the previous Qwen3.5-27B is a sharp jump specifically in agentic coding and repository-level reasoning. On SWE-bench Verified the model scores 77.2 vs. 75.0 for Qwen3.5-27B, on SWE-bench Pro — 53.5 vs. 51.2, on Terminal-Bench 2.0 — 59.3 vs. 41.6, on SkillsBench Avg5 — 48.2 vs. 27.2, on QwenWebBench — 1487 vs. 1068. Compared to Qwen3.5-397B-A17B, the model looks particularly interesting: with 27B dense parameters it outperforms the 397B-total / 17B-active MoE predecessor on major coding benchmarks, including SWE-bench Verified, SWE-bench Pro, Terminal-Bench 2.0, SkillsBench, NL2Repo, and the Claw series. This is its main “wow” feature: not just a new model with a large context, but a compact dense model that catches up with and sometimes surpasses much larger systems on developer tasks.

Use cases are naturally built around the model’s strengths: agentic programming, automatic bug fixing, work with large repositories, frontend generation and refinement, terminal task execution, pull request and CI error analysis. Thanks to its multimodality, the model is suitable for analyzing interface screenshots, mockups, diagrams, OCR documents, video, and visual QA, and thanks to its long context — for analyzing large codebases, technical documentation, RAG scenarios, and multi-step enterprise assistants. For self-hosting, you can choose BF16 as the primary high-quality version, and FP8 when memory, throughput, and inference cost are critical while maintaining near-identical quality.


Announce Date: 21.04.2026
Parameters: 28B
Context: 263K
Layers: 64, using full attention: 16
Attention Type: Hybrid Attention
Mamba Type: Gated Delta Net
Developer: Qwen
Transformers Version: 4.57.1
vLLM Version: 0.17.0
License: Apache 2.0

Public endpoint

Use our pre-built public endpoints for free to test inference and explore Qwen3.6-27B capabilities. You can obtain an API access token on the token management page after registration and verification.
Model Name Context Type GPU Status Link
There are no public endpoints for this model yet.

Private server

Rent your own physically dedicated instance with hourly or long-term monthly billing.

We recommend deploying private instances in the following scenarios:

  • maximize endpoint performance,
  • enable full context for long sequences,
  • ensure top-tier security for data processing in an isolated, dedicated environment,
  • use custom weights, such as fine-tuned models or LoRA adapters.

Recommended server configurations for hosting Qwen3.6-27B

Prices:
Name GPU Price, hour TPS Max Concurrency
teslat4-3.32.64.160
262,144.0
tensor
3 $0.88 1.116 Launch
teslaa10-2.16.64.160
262,144.0
tensor
2 $0.93 1.271 Launch
teslaa2-3.32.128.160
262,144.0
tensor
3 $1.06 1.116 Launch
rtxa5000-2.16.64.160.nvlink
262,144.0
tensor
2 $1.23 1.271 Launch
rtx3090-2.16.64.160
262,144.0
tensor
2 $1.56 1.271 Launch
rtx4090-2.16.64.160
262,144.0
tensor
2 $1.92 1.271 Launch
teslaa100-1.16.64.160
262,144.0
1 $2.37 3.210 Launch
rtx5090-2.16.64.160
262,144.0
tensor
2 $2.93 2.163 Launch
h100-1.16.64.160
262,144.0
1 $3.83 3.210 Launch
h100nvl-1.16.96.160
262,144.0
1 $4.11 3.990 Launch
teslaa100-2.24.96.160.nvlink
262,144.0
tensor
2 $4.61 7.516 Launch
h200-1.16.128.160
262,144.0
1 $4.74 6.611 Launch
h200-2.24.256.160.nvlink
262,144.0
tensor
2 $9.40 14.318 Launch
Prices:
Name GPU Price, hour TPS Max Concurrency
teslat4-4.16.64.160
262,144.0
tensor
4 $0.96 1.168 Launch
teslaa2-4.32.128.160
262,144.0
tensor
4 $1.26 1.168 Launch
teslaa10-3.16.96.160
262,144.0
tensor
3 $1.34 1.769 Launch
rtx3090-3.16.96.160
262,144.0
tensor
3 $2.29 1.769 Launch
rtxa5000-4.16.128.160.nvlink
262,144.0
tensor
4 $2.34 2.952 Launch
teslaa100-1.16.64.160
262,144.0
1 $2.37 2.525 Launch
rtx4090-3.16.96.160
262,144.0
tensor
3 $2.83 1.769 Launch
rtx5090-2.16.64.160
262,144.0
tensor
2 $2.93 1.478 Launch
h100-1.16.64.160
262,144.0
1 $3.83 2.525 Launch
h100nvl-1.16.96.160
262,144.0
1 $4.11 3.306 Launch
teslaa100-2.24.96.160.nvlink
262,144.0
tensor
2 $4.61 6.831 Launch
h200-1.16.128.160
262,144.0
1 $4.74 5.926 Launch
h200-2.24.256.160.nvlink
262,144.0
tensor
2 $9.40 13.633 Launch
Prices:
Name GPU Price, hour TPS Max Concurrency
teslaa2-6.32.128.160
262,144.0
tensor
6 $1.65 1.218 Launch
teslaa10-4.16.128.160
262,144.0
tensor
4 $1.75 1.527 Launch
rtxa5000-4.16.128.160.nvlink
262,144.0
tensor
4 $2.34 1.527 Launch
teslaa100-1.16.128.160
262,144.0
1 $2.50 1.100 Launch
rtx3090-4.16.96.320
262,144.0
tensor
4 $2.97 1.527 Launch
rtx4090-4.16.96.320
262,144.0
tensor
4 $3.68 1.527 Launch
h100-1.16.128.160
262,144.0
1 $3.95 1.100 Launch
h100nvl-1.16.96.160
262,144.0
1 $4.11 1.881 Launch
rtx5090-3.16.96.160
262,144.0
tensor
3 $4.34 1.682 Launch
teslaa100-2.24.96.160.nvlink
262,144.0
tensor
2 $4.61 5.406 Launch
h200-1.16.128.160
262,144.0
1 $4.74 4.501 Launch
h200-2.24.256.160.nvlink
262,144.0
tensor
2 $9.40 12.209 Launch

Related models

Need help?

Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.