The Qwen3.5-0.8B model is ultra-compact—the smallest in the Qwen 3.5 series—yet it retains all the technical innovations and advantages of the lineup. Its architecture is built on a hybrid approach, combining two key mechanisms: Gated DeltaNet and Gated Attention, arranged across 24 layers in a pattern of 6 × (3 × (Gated DeltaNet → FFN) → 1 × (Gated Attention → FFN)). This allows the model to efficiently compress and process long sequences with lower computational costs compared to traditional Transformers. It supports Multi-Token Prediction (MTP) and comes with ready-made integrations for popular inference frameworks such as vLLM, SGLang, and Transformers.
The uniqueness of Qwen3.5-0.8B lies in its ability to be truly multimodal while maintaining an extremely small size (0.8B parameters). Unlike its predecessor, Qwen3-0.6B, which was purely text-based, the new model integrates a vision encoder and was trained in its early stages on mixed multimodal data. This enables it not only to read text within images but also to understand complex visual scenes, diagrams, and even videos. The model supports 201 languages, a reasoning mode (thinking mode), improved instruction following, and a native context window of 262,144 tokens—a record for models of this size.
Thanks to its architecture and performance, Qwen3.5-0.8B opens up a wide range of possibilities for developers and researchers - Rapid Prototyping and Research: An ideal "sandbox" for testing ideas, prompt engineering, and experimenting with long contexts without the need for expensive hardware.
| Model Name | Context | Type | GPU | Status | Link |
|---|
There are no public endpoints for this model yet.
Rent your own physically dedicated instance with hourly or long-term monthly billing.
We recommend deploying private instances in the following scenarios:
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
262,144.0 |
1 | $0.33 | 3.594 | Launch | ||
262,144.0 |
1 | $0.38 | 2.104 | Launch | ||
262,144.0 |
1 | $0.38 | 3.594 | Launch | ||
262,144.0 |
1 | $0.53 | 5.980 | Launch | ||
262,144.0 |
1 | $0.57 | 1.806 | Launch | ||
262,144.0 |
1 | $0.83 | 5.980 | Launch | ||
262,144.0 |
1 | $1.02 | 5.980 | Launch | ||
262,144.0 |
1 | $1.20 | 8.365 | Launch | ||
262,144.0 tensor |
2 | $1.23 | 12.307 | Launch | ||
262,144.0 |
1 | $1.59 | 8.365 | Launch | ||
262,144.0 |
1 | $2.37 | 22.676 | Launch | ||
262,144.0 |
1 | $3.83 | 22.676 | Launch | ||
262,144.0 |
1 | $4.11 | 26.850 | Launch | ||
262,144.0 |
1 | $4.74 | 40.862 | Launch | ||
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
262,144.0 |
1 | $0.33 | 3.673 | Launch | ||
262,144.0 |
1 | $0.38 | 2.182 | Launch | ||
262,144.0 |
1 | $0.38 | 3.673 | Launch | ||
262,144.0 |
1 | $0.53 | 6.058 | Launch | ||
262,144.0 |
1 | $0.57 | 1.884 | Launch | ||
262,144.0 |
1 | $0.83 | 6.058 | Launch | ||
262,144.0 |
1 | $1.02 | 6.058 | Launch | ||
262,144.0 |
1 | $1.20 | 8.443 | Launch | ||
262,144.0 tensor |
2 | $1.23 | 12.385 | Launch | ||
262,144.0 |
1 | $1.59 | 8.443 | Launch | ||
262,144.0 |
1 | $2.37 | 22.754 | Launch | ||
262,144.0 |
1 | $3.83 | 22.754 | Launch | ||
262,144.0 |
1 | $4.11 | 26.928 | Launch | ||
262,144.0 |
1 | $4.74 | 40.941 | Launch | ||
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
262,144.0 |
1 | $0.33 | 3.403 | Launch | ||
262,144.0 |
1 | $0.38 | 1.912 | Launch | ||
262,144.0 |
1 | $0.38 | 3.403 | Launch | ||
262,144.0 |
1 | $0.53 | 5.788 | Launch | ||
262,144.0 |
1 | $0.57 | 1.614 | Launch | ||
262,144.0 |
1 | $0.83 | 5.788 | Launch | ||
262,144.0 |
1 | $1.02 | 5.788 | Launch | ||
262,144.0 |
1 | $1.20 | 8.173 | Launch | ||
262,144.0 tensor |
2 | $1.23 | 12.116 | Launch | ||
262,144.0 |
1 | $1.59 | 8.173 | Launch | ||
262,144.0 |
1 | $2.37 | 22.484 | Launch | ||
262,144.0 |
1 | $3.83 | 22.484 | Launch | ||
262,144.0 |
1 | $4.11 | 26.658 | Launch | ||
262,144.0 |
1 | $4.74 | 40.671 | Launch | ||
Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.