Products

Cloud servers

Cloud servers with per-second billing. Isolated resources will give maximum performance for your project.

GPU servers

Cloud servers with modern RTX and Tesla graphics accelerators for games, rendering, streaming, working with 3D graphics, and artificial intelligence.

H200

H100 NVL

H100

RTX 5090

RTX 4090

RTX 3090

RTX 3080

A100

RTX A5000

A10

RTX 2080 Ti

A2

Tesla T4

Tesla V100

All GPU servers

CPU servers

The cloud servers with high-performance Intel Xeon Gold 2nd, 3rd and 5th generation CPU are available for 100% of the processor time.
SSD servers NVMe servers
All CPU servers

Dedicated servers

Rent a physically dedicated server for a long term with a monthly payment. Configure it using modern components: Intel Xeon Gold 2nd, 3rd and 5th generation processors, up to 10 of the latest RTX and Tesla video accelerators, and up to 8192 GB of RAM per server, SSD and NVMe disks for data centers.

Select a dedicated server

Marketplace

Use popular and modern applications as effective tools for organizing your project. Save time with pre-configured images that already have all the necessary components installed.

Forget about manually downloading and installing the software — just deploy a virtual server with a ready-made image.
Neural networks 3D CUDA Docker / NGC For games Windows images Linux images
All pre-installed images
Features
Prices
FAQ
Contact
Login

Qwen3.6-27B

reasoning

multimodal

coding

Qwen3.6-27B is an open 27B model from the Qwen3.6 family, released as a dense model — meaning without MoE routing. Despite this, it remains natively multimodal: it handles text, images, and video, supports reasoning in thinking mode and direct answers in non-thinking mode. The base version is published in BF16, and there is also an official FP8-quantized version with metrics nearly identical to the base model.

The main architectural feature is a hybrid attention scheme: 16 × (3 × Gated DeltaNet → FFN + 1 × Gated Attention → FFN), meaning three-quarters of the blocks use Gated DeltaNet, and every fourth block uses Gated Attention. Gated DeltaNet can be understood as a more efficient linear attention mechanism: it does not recompute all pairwise token relationships like classic attention but updates a compact state and uses gating to decide which information to retain or pass forward. Gated Attention, on the other hand, retains more precise standard attention in some layers: it is useful for explicitly extracting details from context, while gating helps filter and stabilize the output. As a result, the model combines the long-context efficiency of DeltaNet with the precision of classic attention.

The model is trained with Multi-Token Prediction (MTP) and boasts a native context window of 262,144 tokens, extendable to 1,010,000 tokens via RoPE/YaRN scaling. The developers specifically warn that if you run out of memory you can reduce the context, but for complex reasoning tasks it is advisable to keep it at least 128K tokens, because long context directly contributes to the quality of reasoning. Another important feature is preserve_thinking: the model can retain the reasoning context of past messages, which is especially useful for multi-step agents where it is important not to start analysis from scratch on every turn. For production, the developers recommend SGLang, vLLM, or KTransformers; for generation in thinking mode — temperature 1.0, top_p 0.95, top_k 20; for precise coding/WebDev — temperature 0.6, top_p 0.95, top_k 20; and for non-thinking mode — temperature 0.7, top_p 0.80, and presence_penalty 1.5.

The main difference between Qwen3.6-27B and the previous Qwen3.5-27B is a sharp jump specifically in agentic coding and repository-level reasoning. On SWE-bench Verified the model scores 77.2 vs. 75.0 for Qwen3.5-27B, on SWE-bench Pro — 53.5 vs. 51.2, on Terminal-Bench 2.0 — 59.3 vs. 41.6, on SkillsBench Avg5 — 48.2 vs. 27.2, on QwenWebBench — 1487 vs. 1068. Compared to Qwen3.5-397B-A17B, the model looks particularly interesting: with 27B dense parameters it outperforms the 397B-total / 17B-active MoE predecessor on major coding benchmarks, including SWE-bench Verified, SWE-bench Pro, Terminal-Bench 2.0, SkillsBench, NL2Repo, and the Claw series. This is its main “wow” feature: not just a new model with a large context, but a compact dense model that catches up with and sometimes surpasses much larger systems on developer tasks.

Use cases are naturally built around the model’s strengths: agentic programming, automatic bug fixing, work with large repositories, frontend generation and refinement, terminal task execution, pull request and CI error analysis. Thanks to its multimodality, the model is suitable for analyzing interface screenshots, mockups, diagrams, OCR documents, video, and visual QA, and thanks to its long context — for analyzing large codebases, technical documentation, RAG scenarios, and multi-step enterprise assistants. For self-hosting, you can choose BF16 as the primary high-quality version, and FP8 when memory, throughput, and inference cost are critical while maintaining near-identical quality.

Announce Date: 21.04.2026
Parameters: 28B
Context: 263K
Layers: 64, using full attention: 16
Attention Type: Hybrid Attention
Mamba Type: Gated Delta Net
Developer: Qwen
Transformers Version: 4.57.1
vLLM Version: 0.17.0
License: Apache 2.0

Public endpoint

Use our pre-built public endpoints for free to test inference and explore Qwen3.6-27B capabilities. You can obtain an API access token on the token management page after registration and verification.

Model Name	Context	Type	GPU	Status	Link


        There are no public endpoints for this model yet.

Private server

Rent your own physically dedicated instance with hourly or long-term monthly billing.

We recommend deploying private instances in the following scenarios:

maximize endpoint performance,
enable full context for long sequences,
ensure top-tier security for data processing in an isolated, dedicated environment,
use custom weights, such as fine-tuned models or LoRA adapters.

Recommended server configurations for hosting Qwen3.6-27B

Prices:

Name	GPU	Price, hour	Price, month	Max Concurrency
teslat4-3.32.64.160 262,144.0 tensor	3	$0.88	$633.35	1.116	Launch
teslaa10-2.16.64.160 262,144.0 tensor	2	$0.93	$672.04	1.271	Launch
teslaa2-3.32.128.160 262,144.0 tensor	3	$1.06	$762.88	1.116	Launch
rtxa5000-2.16.64.160.nvlink 262,144.0 tensor	2	$1.23	$884.85	1.271	Launch
rtx3090-2.16.64.160 262,144.0 tensor	2	$1.56	$1 126.67	1.271	Launch
rtx4090-2.16.64.160 262,144.0 tensor	2	$1.92	$1 384.62	1.271	Launch
teslaa100-1.16.64.160 262,144.0	1	$2.37	$1 707.06	3.210	Launch
rtx5090-2.16.64.160 262,144.0 tensor	2	$2.93	$2 110.10	2.163	Launch
h100-1.16.64.160 262,144.0	1	$3.83	$2 754.98	3.210	Launch
h100nvl-1.16.96.160 262,144.0	1	$4.11	$2 961.66	3.990	Launch
teslaa100-2.24.96.160.nvlink 262,144.0 tensor	2	$4.61	$3 319.56	7.516	Launch
h200-1.16.128.160 262,144.0	1	$4.74	$3 410.09	6.611	Launch
h200-2.24.256.160.nvlink 262,144.0 tensor	2	$9.40	$6 770.92	14.318	Launch

Prices:

Name	GPU	Price, hour	Price, month	Max Concurrency
teslat4-4.16.64.160 262,144.0 tensor	4	$0.96	$691.38	1.168	Launch
teslaa2-4.32.128.160 262,144.0 tensor	4	$1.26	$904.76	1.168	Launch
teslaa10-3.16.96.160 262,144.0 tensor	3	$1.34	$965.78	1.769	Launch
rtx3090-3.16.96.160 262,144.0 tensor	3	$2.29	$1 647.73	1.769	Launch
rtxa5000-4.16.128.160.nvlink 262,144.0 tensor	4	$2.34	$1 685.05	2.952	Launch
teslaa100-1.16.64.160 262,144.0	1	$2.37	$1 707.06	2.525	Launch
rtx4090-3.16.96.160 262,144.0 tensor	3	$2.83	$2 034.65	1.769	Launch
rtx5090-2.16.64.160 262,144.0 tensor	2	$2.93	$2 110.10	1.478	Launch
h100-1.16.64.160 262,144.0	1	$3.83	$2 754.98	2.525	Launch
h100nvl-1.16.96.160 262,144.0	1	$4.11	$2 961.66	3.306	Launch
teslaa100-2.24.96.160.nvlink 262,144.0 tensor	2	$4.61	$3 319.56	6.831	Launch
h200-1.16.128.160 262,144.0	1	$4.74	$3 410.09	5.926	Launch
h200-2.24.256.160.nvlink 262,144.0 tensor	2	$9.40	$6 770.92	13.633	Launch

Prices:

Name	GPU	Price, hour	Price, month	Max Concurrency
teslaa2-6.32.128.160 262,144.0 tensor	6	$1.65	$1 188.50	1.218	Launch
teslaa10-4.16.128.160 262,144.0 tensor	4	$1.75	$1 259.44	1.527	Launch
rtxa5000-4.16.128.160.nvlink 262,144.0 tensor	4	$2.34	$1 685.05	1.527	Launch
teslaa100-1.16.128.160 262,144.0	1	$2.50	$1 797.90	1.100	Launch
rtx3090-4.16.96.320 262,144.0 tensor	4	$2.97	$2 135.82	1.527	Launch
rtx4090-4.16.96.320 262,144.0 tensor	4	$3.68	$2 651.72	1.527	Launch
h100-1.16.128.160 262,144.0	1	$3.95	$2 845.82	1.100	Launch
h100nvl-1.16.96.160 262,144.0	1	$4.11	$2 961.66	1.881	Launch
rtx5090-3.16.96.160 262,144.0 tensor	3	$4.34	$3 122.88	1.682	Launch
teslaa100-2.24.96.160.nvlink 262,144.0 tensor	2	$4.61	$3 319.56	5.406	Launch
h200-1.16.128.160 262,144.0	1	$4.74	$3 410.09	4.501	Launch
h200-2.24.256.160.nvlink 262,144.0 tensor	2	$9.40	$6 770.92	12.209	Launch

Related models

Qwen3-Next-80B-A3B-Instruct

Qwen3-Next-80B-A3B-Thinking

Qwen3-Coder-Next

Qwen3.5-397B-A17B

Qwen3.5-122B-A10B

Qwen3.5-35B-A3B

Qwen3.5-27B

Qwen3.5-9B

Qwen3.5-4B

Qwen3.5-2B

Qwen3.5-0.8B

Qwen3.6-35B-A3B

Need help?

Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.

Qwen3.6-27B

Public endpoint

Private server

Recommended server configurations for hosting Qwen3.6-27B

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Related models

Need help?