Qwen3.5-9B is a compact dense model with 9 billion parameters, retaining the key capabilities of the flagship versions in the series. Built on 32 layers, it utilizes the same hybrid architecture as the larger models: interleaving blocks with Gated DeltaNet and Gated Attention (in a 3:1 ratio), ensuring a balance between speed, accuracy, and memory efficiency on long contexts. This allows it to natively process up to 262k tokens effectively, confidently competing with models 2–3 times its size. The model supports native multimodality with understanding of text, images, and video, making it a versatile solution for various tasks.
The model's results are very impressive for its "weight class." In knowledge tests (MMLU-Pro – 82.5), it surpasses many larger models, including GPT-OSS-120B. In instruction following (IFEval – 91.5), it shows results close to top-tier models. Its agentic capabilities particularly stand out: TAU2-Bench (79.1) and BFCL-V4 (66.1) — results that were only achievable by models of the 70B+ scale just a year ago. Its multimodal capabilities are also top-notch: MathVision (78.9), MMMU-Pro (70.1), OCRBench (89.2), and VlmsAreBlind (93.7) demonstrate a deep understanding of visual information.
The model is optimal for scenarios where a balance between performance and resources is crucial. It requires only 8GB of RAM to run in a quantized format, making it accessible for consumer hardware. It is ideal for building OCR systems, real-time document and image analysis, and tasks requiring fast data processing locally within an enterprise perimeter.
| Model Name | Context | Type | GPU | Status | Link |
|---|
There are no public endpoints for this model yet.
Rent your own physically dedicated instance with hourly or long-term monthly billing.
We recommend deploying private instances in the following scenarios:
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
262,144.0 |
1 | $0.53 | 1.336 | Launch | ||
262,144.0 tensor |
2 | $0.54 | 1.920 | Launch | ||
262,144.0 tensor |
2 | $0.57 | 1.920 | Launch | ||
262,144.0 |
1 | $0.83 | 1.336 | Launch | ||
262,144.0 pipeline |
3 | $0.84 | 1.721 | Launch | ||
262,144.0 |
1 | $1.02 | 1.336 | Launch | ||
262,144.0 tensor |
4 | $1.12 | 2.640 | Launch | ||
262,144.0 |
1 | $1.20 | 2.230 | Launch | ||
262,144.0 tensor |
2 | $1.23 | 3.709 | Launch | ||
262,144.0 pipeline |
3 | $1.43 | 1.385 | Launch | ||
262,144.0 |
1 | $1.59 | 2.230 | Launch | ||
262,144.0 tensor |
4 | $1.82 | 2.193 | Launch | ||
262,144.0 |
1 | $2.37 | 7.598 | Launch | ||
262,144.0 |
1 | $3.83 | 7.598 | Launch | ||
262,144.0 |
1 | $4.11 | 9.163 | Launch | ||
262,144.0 |
1 | $4.74 | 14.419 | Launch | ||
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
262,144.0 |
1 | $0.53 | 0.963 | Launch | ||
262,144.0 tensor |
2 | $0.54 | 1.547 | Launch | ||
262,144.0 tensor |
2 | $0.57 | 1.547 | Launch | ||
262,144.0 |
1 | $0.83 | 0.963 | Launch | ||
262,144.0 pipeline |
3 | $0.84 | 1.348 | Launch | ||
262,144.0 |
1 | $1.02 | 0.963 | Launch | ||
262,144.0 tensor |
4 | $1.12 | 2.268 | Launch | ||
262,144.0 |
1 | $1.20 | 1.858 | Launch | ||
262,144.0 tensor |
2 | $1.23 | 3.336 | Launch | ||
262,144.0 pipeline |
3 | $1.43 | 1.013 | Launch | ||
262,144.0 |
1 | $1.59 | 1.858 | Launch | ||
262,144.0 tensor |
4 | $1.82 | 1.820 | Launch | ||
262,144.0 |
1 | $2.37 | 7.225 | Launch | ||
262,144.0 |
1 | $3.83 | 7.225 | Launch | ||
262,144.0 |
1 | $4.11 | 8.791 | Launch | ||
262,144.0 |
1 | $4.74 | 14.047 | Launch | ||
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
262,144.0 pipeline |
3 | $0.88 | 2.202 | Launch | ||
262,144.0 tensor |
2 | $0.93 | 2.512 | Launch | ||
262,144.0 tensor |
4 | $0.96 | 3.680 | Launch | ||
262,144.0 pipeline |
3 | $1.06 | 2.202 | Launch | ||
262,144.0 tensor |
4 | $1.12 | 1.444 | Launch | ||
262,144.0 |
1 | $1.20 | 1.034 | Launch | ||
262,144.0 tensor |
2 | $1.23 | 2.512 | Launch | ||
262,144.0 tensor |
4 | $1.26 | 3.680 | Launch | ||
262,144.0 tensor |
2 | $1.56 | 2.512 | Launch | ||
262,144.0 |
1 | $1.59 | 1.034 | Launch | ||
262,144.0 tensor |
4 | $1.82 | 0.996 | Launch | ||
262,144.0 tensor |
2 | $1.92 | 2.512 | Launch | ||
262,144.0 |
1 | $2.37 | 6.401 | Launch | ||
262,144.0 |
1 | $3.83 | 6.401 | Launch | ||
262,144.0 |
1 | $4.11 | 7.967 | Launch | ||
262,144.0 |
1 | $4.74 | 13.223 | Launch | ||
Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.