Qwen3.6-27B is an open 27B model from the Qwen3.6 family, released as a dense model — meaning without MoE routing. Despite this, it remains natively multimodal: it handles text, images, and video, supports reasoning in thinking mode and direct answers in non-thinking mode. The base version is published in BF16, and there is also an official FP8-quantized version with metrics nearly identical to the base model.
The main architectural feature is a hybrid attention scheme: 16 × (3 × Gated DeltaNet → FFN + 1 × Gated Attention → FFN), meaning three-quarters of the blocks use Gated DeltaNet, and every fourth block uses Gated Attention. Gated DeltaNet can be understood as a more efficient linear attention mechanism: it does not recompute all pairwise token relationships like classic attention but updates a compact state and uses gating to decide which information to retain or pass forward. Gated Attention, on the other hand, retains more precise standard attention in some layers: it is useful for explicitly extracting details from context, while gating helps filter and stabilize the output. As a result, the model combines the long-context efficiency of DeltaNet with the precision of classic attention.
The model is trained with Multi-Token Prediction (MTP) and boasts a native context window of 262,144 tokens, extendable to 1,010,000 tokens via RoPE/YaRN scaling. The developers specifically warn that if you run out of memory you can reduce the context, but for complex reasoning tasks it is advisable to keep it at least 128K tokens, because long context directly contributes to the quality of reasoning. Another important feature is preserve_thinking: the model can retain the reasoning context of past messages, which is especially useful for multi-step agents where it is important not to start analysis from scratch on every turn. For production, the developers recommend SGLang, vLLM, or KTransformers; for generation in thinking mode — temperature 1.0, top_p 0.95, top_k 20; for precise coding/WebDev — temperature 0.6, top_p 0.95, top_k 20; and for non-thinking mode — temperature 0.7, top_p 0.80, and presence_penalty 1.5.
The main difference between Qwen3.6-27B and the previous Qwen3.5-27B is a sharp jump specifically in agentic coding and repository-level reasoning. On SWE-bench Verified the model scores 77.2 vs. 75.0 for Qwen3.5-27B, on SWE-bench Pro — 53.5 vs. 51.2, on Terminal-Bench 2.0 — 59.3 vs. 41.6, on SkillsBench Avg5 — 48.2 vs. 27.2, on QwenWebBench — 1487 vs. 1068. Compared to Qwen3.5-397B-A17B, the model looks particularly interesting: with 27B dense parameters it outperforms the 397B-total / 17B-active MoE predecessor on major coding benchmarks, including SWE-bench Verified, SWE-bench Pro, Terminal-Bench 2.0, SkillsBench, NL2Repo, and the Claw series. This is its main “wow” feature: not just a new model with a large context, but a compact dense model that catches up with and sometimes surpasses much larger systems on developer tasks.
Use cases are naturally built around the model’s strengths: agentic programming, automatic bug fixing, work with large repositories, frontend generation and refinement, terminal task execution, pull request and CI error analysis. Thanks to its multimodality, the model is suitable for analyzing interface screenshots, mockups, diagrams, OCR documents, video, and visual QA, and thanks to its long context — for analyzing large codebases, technical documentation, RAG scenarios, and multi-step enterprise assistants. For self-hosting, you can choose BF16 as the primary high-quality version, and FP8 when memory, throughput, and inference cost are critical while maintaining near-identical quality.
| Model Name | Context | Type | GPU | Status | Link |
|---|
There are no public endpoints for this model yet.
Rent your own physically dedicated instance with hourly or long-term monthly billing.
We recommend deploying private instances in the following scenarios:
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
262,144.0 tensor |
3 | $0.88 | 1.116 | Launch | ||
262,144.0 tensor |
2 | $0.93 | 1.271 | Launch | ||
262,144.0 tensor |
3 | $1.06 | 1.116 | Launch | ||
262,144.0 tensor |
2 | $1.23 | 1.271 | Launch | ||
262,144.0 tensor |
2 | $1.56 | 1.271 | Launch | ||
262,144.0 tensor |
2 | $1.92 | 1.271 | Launch | ||
262,144.0 |
1 | $2.37 | 3.210 | Launch | ||
262,144.0 tensor |
2 | $2.93 | 2.163 | Launch | ||
262,144.0 |
1 | $3.83 | 3.210 | Launch | ||
262,144.0 |
1 | $4.11 | 3.990 | Launch | ||
262,144.0 tensor |
2 | $4.61 | 7.516 | Launch | ||
262,144.0 |
1 | $4.74 | 6.611 | Launch | ||
262,144.0 tensor |
2 | $9.40 | 14.318 | Launch | ||
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
262,144.0 tensor |
4 | $0.96 | 1.168 | Launch | ||
262,144.0 tensor |
4 | $1.26 | 1.168 | Launch | ||
262,144.0 tensor |
3 | $1.34 | 1.769 | Launch | ||
262,144.0 tensor |
3 | $2.29 | 1.769 | Launch | ||
262,144.0 tensor |
4 | $2.34 | 2.952 | Launch | ||
262,144.0 |
1 | $2.37 | 2.525 | Launch | ||
262,144.0 tensor |
3 | $2.83 | 1.769 | Launch | ||
262,144.0 tensor |
2 | $2.93 | 1.478 | Launch | ||
262,144.0 |
1 | $3.83 | 2.525 | Launch | ||
262,144.0 |
1 | $4.11 | 3.306 | Launch | ||
262,144.0 tensor |
2 | $4.61 | 6.831 | Launch | ||
262,144.0 |
1 | $4.74 | 5.926 | Launch | ||
262,144.0 tensor |
2 | $9.40 | 13.633 | Launch | ||
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
262,144.0 tensor |
6 | $1.65 | 1.218 | Launch | ||
262,144.0 tensor |
4 | $1.75 | 1.527 | Launch | ||
262,144.0 tensor |
4 | $2.34 | 1.527 | Launch | ||
262,144.0 |
1 | $2.50 | 1.100 | Launch | ||
262,144.0 tensor |
4 | $2.97 | 1.527 | Launch | ||
262,144.0 tensor |
4 | $3.68 | 1.527 | Launch | ||
262,144.0 |
1 | $3.95 | 1.100 | Launch | ||
262,144.0 |
1 | $4.11 | 1.881 | Launch | ||
262,144.0 tensor |
3 | $4.34 | 1.682 | Launch | ||
262,144.0 tensor |
2 | $4.61 | 5.406 | Launch | ||
262,144.0 |
1 | $4.74 | 4.501 | Launch | ||
262,144.0 tensor |
2 | $9.40 | 12.209 | Launch | ||
Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.