Qwen3-VL-4B-Instruct is a compact 4-billion-parameter multimodal model designed for efficient deployment on resource-constrained servers while retaining the full functionality of the Qwen3-VL series. Despite being half the size of the 8B version, the model preserves all key architectural innovations: Interleaved-MRoPE for video understanding, DeepStack for multi-level visual feature fusion, and Text-Timestamp Alignment for precise temporal localization. The seamless integration of text and visual modalities provides an understanding of multimodal context at a level comparable to pure-text LLMs.
In terms of performance, Qwen3-VL-4B-Instruct approaches the results of Qwen2.5-VL-7B, demonstrating that the reduction in model size was achieved without significant loss of quality. The model supports a native context of 256K tokens (expandable to 1M), enabling the processing of long documents, multi-hour videos, and complex multimodal dialogues. Advanced OCR capabilities with support for 32 languages and resilience to challenging shooting conditions make the 4B model a full-fledged solution for intelligent document processing tasks, despite its compact size.
Qwen3-VL-4B-Instruct represents an ideal solution for scenarios requiring a balance between performance and efficiency: deployment on consumer devices, the ability to process large volumes of visual content, fast response times for integration into real-time applications, and research projects. Furthermore, the open Apache 2.0 license allows for free commercial use of the model, making it accessible to a wide range of users, from startups to large enterprises.
| Model Name | Context | Type | GPU | Status | Link |
|---|
There are no public endpoints for this model yet.
Rent your own physically dedicated instance with hourly or long-term monthly billing.
We recommend deploying private instances in the following scenarios:
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
262,144.0 tensor |
4 | $0.96 | 1.120 | Launch | ||
262,144.0 tensor |
4 | $1.26 | 1.125 | Launch | ||
262,144.0 pipeline |
3 | $1.34 | 1.427 | Launch | ||
262,144.0 tensor |
4 | $1.57 | 1.932 | Launch | ||
262,144.0 pipeline |
3 | $2.29 | 1.508 | Launch | ||
262,144.0 tensor |
4 | $2.34 | 1.932 | Launch | ||
262,144.0 |
1 | $2.37 | 1.866 | Launch | ||
262,144.0 pipeline |
3 | $2.83 | 1.505 | Launch | ||
262,144.0 tensor |
4 | $2.89 | 2.040 | Launch | ||
262,144.0 tensor |
2 | $2.93 | 1.373 | Launch | ||
262,144.0 tensor |
4 | $3.60 | 2.036 | Launch | ||
262,144.0 |
1 | $3.83 | 1.864 | Launch | ||
262,144.0 |
1 | $4.11 | 2.219 | Launch | ||
262,144.0 tensor |
2 | $4.61 | 3.822 | Launch | ||
262,144.0 |
1 | $4.74 | 3.413 | Launch | ||
262,144.0 tensor |
2 | $9.40 | 6.916 | Launch | ||
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
262,144.0 tensor |
4 | $0.96 | 1.055 | Launch | ||
262,144.0 tensor |
4 | $1.26 | 1.059 | Launch | ||
262,144.0 pipeline |
3 | $1.34 | 67.960 | 1.361 | Launch | |
262,144.0 tensor |
4 | $1.57 | 1.866 | Launch | ||
262,144.0 pipeline |
3 | $2.29 | 1.442 | Launch | ||
262,144.0 tensor |
4 | $2.34 | 1.866 | Launch | ||
262,144.0 |
1 | $2.37 | 93.860 | 1.800 | Launch | |
262,144.0 pipeline |
3 | $2.83 | 1.439 | Launch | ||
262,144.0 tensor |
4 | $2.89 | 1.975 | Launch | ||
262,144.0 tensor |
2 | $2.93 | 1.308 | Launch | ||
262,144.0 tensor |
4 | $3.60 | 1.971 | Launch | ||
262,144.0 |
1 | $3.83 | 126.670 | 1.798 | Launch | |
262,144.0 |
1 | $4.11 | 2.154 | Launch | ||
262,144.0 tensor |
2 | $4.61 | 3.756 | Launch | ||
262,144.0 |
1 | $4.74 | 3.347 | Launch | ||
262,144.0 tensor |
2 | $9.40 | 6.851 | Launch | ||
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
262,144.0 pipeline |
3 | $1.34 | 1.287 | Launch | ||
262,144.0 tensor |
4 | $1.57 | 1.793 | Launch | ||
262,144.0 pipeline |
6 | $1.65 | 1.592 | Launch | ||
262,144.0 pipeline |
3 | $2.29 | 1.368 | Launch | ||
262,144.0 tensor |
4 | $2.34 | 1.793 | Launch | ||
262,144.0 |
1 | $2.37 | 74.840 | 1.726 | Launch | |
262,144.0 pipeline |
3 | $2.83 | 1.365 | Launch | ||
262,144.0 tensor |
4 | $2.89 | 1.901 | Launch | ||
262,144.0 tensor |
2 | $2.93 | 1.234 | Launch | ||
262,144.0 tensor |
4 | $3.60 | 1.897 | Launch | ||
262,144.0 |
1 | $3.83 | 106.830 | 1.724 | Launch | |
262,144.0 |
1 | $4.11 | 2.080 | Launch | ||
262,144.0 tensor |
2 | $4.61 | 3.682 | Launch | ||
262,144.0 |
1 | $4.74 | 3.274 | Launch | ||
262,144.0 tensor |
2 | $9.40 | 6.777 | Launch | ||
Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.