Qwen3-VL-4B-Instruct is a compact 4-billion-parameter multimodal model designed for efficient deployment on resource-constrained servers while retaining the full functionality of the Qwen3-VL series. Despite being half the size of the 8B version, the model preserves all key architectural innovations: Interleaved-MRoPE for video understanding, DeepStack for multi-level visual feature fusion, and Text-Timestamp Alignment for precise temporal localization. The seamless integration of text and visual modalities provides an understanding of multimodal context at a level comparable to pure-text LLMs.
In terms of performance, Qwen3-VL-4B-Instruct approaches the results of Qwen2.5-VL-7B, demonstrating that the reduction in model size was achieved without significant loss of quality. The model supports a native context of 256K tokens (expandable to 1M), enabling the processing of long documents, multi-hour videos, and complex multimodal dialogues. Advanced OCR capabilities with support for 32 languages and resilience to challenging shooting conditions make the 4B model a full-fledged solution for intelligent document processing tasks, despite its compact size.
Qwen3-VL-4B-Instruct represents an ideal solution for scenarios requiring a balance between performance and efficiency: deployment on consumer devices, the ability to process large volumes of visual content, fast response times for integration into real-time applications, and research projects. Furthermore, the open Apache 2.0 license allows for free commercial use of the model, making it accessible to a wide range of users, from startups to large enterprises.
| Model Name | Context | Type | GPU | Status | Link |
|---|
There are no public endpoints for this model yet.
Rent your own physically dedicated instance with hourly or long-term monthly billing.
We recommend deploying private instances in the following scenarios:
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
262,144.0 tensor |
4 | $0.96 | 1.226 | Launch | ||
262,144.0 tensor |
4 | $1.26 | 1.226 | Launch | ||
262,144.0 pipeline |
3 | $1.34 | 1.495 | Launch | ||
262,144.0 tensor |
4 | $1.57 | 2.026 | Launch | ||
262,144.0 tensor |
2 | $2.22 | 1.364 | Launch | ||
262,144.0 pipeline |
3 | $2.29 | 1.495 | Launch | ||
262,144.0 tensor |
4 | $2.34 | 2.026 | Launch | ||
262,144.0 |
1 | $2.37 | 1.834 | Launch | ||
262,144.0 pipeline |
3 | $2.83 | 1.495 | Launch | ||
262,144.0 tensor |
4 | $2.89 | 2.026 | Launch | ||
262,144.0 tensor |
2 | $2.93 | 1.364 | Launch | ||
262,144.0 tensor |
4 | $3.60 | 2.026 | Launch | ||
262,144.0 |
1 | $3.83 | 1.834 | Launch | ||
262,144.0 |
1 | $4.11 | 2.184 | Launch | ||
262,144.0 |
1 | $4.74 | 3.359 | Launch | ||
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
262,144.0 tensor |
4 | $0.96 | 1.155 | Launch | ||
262,144.0 tensor |
4 | $1.26 | 1.155 | Launch | ||
262,144.0 pipeline |
3 | $1.34 | 67.960 | 1.425 | Launch | |
262,144.0 tensor |
4 | $1.57 | 1.955 | Launch | ||
262,144.0 tensor |
2 | $2.22 | 1.294 | Launch | ||
262,144.0 pipeline |
3 | $2.29 | 65.690 | 1.425 | Launch | |
262,144.0 tensor |
4 | $2.34 | 1.955 | Launch | ||
262,144.0 |
1 | $2.37 | 93.860 | 1.764 | Launch | |
262,144.0 pipeline |
3 | $2.83 | 57.900 | 1.425 | Launch | |
262,144.0 tensor |
4 | $2.89 | 1.955 | Launch | ||
262,144.0 tensor |
2 | $2.93 | 1.294 | Launch | ||
262,144.0 tensor |
4 | $3.60 | 1.955 | Launch | ||
262,144.0 |
1 | $3.83 | 126.670 | 1.764 | Launch | |
262,144.0 |
1 | $4.11 | 2.114 | Launch | ||
262,144.0 |
1 | $4.74 | 3.289 | Launch | ||
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
262,144.0 tensor |
4 | $0.96 | 1.076 | Launch | ||
262,144.0 tensor |
4 | $1.26 | 1.076 | Launch | ||
262,144.0 pipeline |
3 | $1.34 | 1.345 | Launch | ||
262,144.0 tensor |
4 | $1.57 | 1.876 | Launch | ||
262,144.0 tensor |
2 | $2.22 | 1.214 | Launch | ||
262,144.0 pipeline |
3 | $2.29 | 58.430 | 1.345 | Launch | |
262,144.0 tensor |
4 | $2.34 | 1.876 | Launch | ||
262,144.0 |
1 | $2.37 | 74.840 | 1.684 | Launch | |
262,144.0 pipeline |
3 | $2.83 | 69.680 | 1.345 | Launch | |
262,144.0 tensor |
4 | $2.89 | 1.876 | Launch | ||
262,144.0 tensor |
2 | $2.93 | 87.960 | 1.214 | Launch | |
262,144.0 tensor |
4 | $3.60 | 1.876 | Launch | ||
262,144.0 |
1 | $3.83 | 106.830 | 1.684 | Launch | |
262,144.0 |
1 | $4.11 | 2.034 | Launch | ||
262,144.0 |
1 | $4.74 | 3.209 | Launch | ||
Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.