Qwen3-VL-4B-Instruct is a compact 4-billion-parameter multimodal model designed for efficient deployment on resource-constrained servers while retaining the full functionality of the Qwen3-VL series. Despite being half the size of the 8B version, the model preserves all key architectural innovations: Interleaved-MRoPE for video understanding, DeepStack for multi-level visual feature fusion, and Text-Timestamp Alignment for precise temporal localization. The seamless integration of text and visual modalities provides an understanding of multimodal context at a level comparable to pure-text LLMs.
In terms of performance, Qwen3-VL-4B-Instruct approaches the results of Qwen2.5-VL-7B, demonstrating that the reduction in model size was achieved without significant loss of quality. The model supports a native context of 256K tokens (expandable to 1M), enabling the processing of long documents, multi-hour videos, and complex multimodal dialogues. Advanced OCR capabilities with support for 32 languages and resilience to challenging shooting conditions make the 4B model a full-fledged solution for intelligent document processing tasks, despite its compact size.
Qwen3-VL-4B-Instruct represents an ideal solution for scenarios requiring a balance between performance and efficiency: deployment on consumer devices, the ability to process large volumes of visual content, fast response times for integration into real-time applications, and research projects. Furthermore, the open Apache 2.0 license allows for free commercial use of the model, making it accessible to a wide range of users, from startups to large enterprises.
| Model Name | Context | Type | GPU | Status | Link |
|---|
There are no public endpoints for this model yet.
Rent your own physically dedicated instance with hourly or long-term monthly billing.
We recommend deploying private instances in the following scenarios:
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
262,144.0 tensor |
4 | $0.96 | 1.114 | Launch | ||
262,144.0 tensor |
4 | $1.26 | 1.118 | Launch | ||
262,144.0 pipeline |
3 | $1.34 | 1.420 | Launch | ||
262,144.0 tensor |
4 | $1.57 | 1.926 | Launch | ||
262,144.0 pipeline |
3 | $2.29 | 1.501 | Launch | ||
262,144.0 tensor |
4 | $2.34 | 1.926 | Launch | ||
262,144.0 |
1 | $2.37 | 1.859 | Launch | ||
262,144.0 pipeline |
3 | $2.83 | 1.498 | Launch | ||
262,144.0 tensor |
4 | $2.89 | 2.034 | Launch | ||
262,144.0 tensor |
2 | $2.93 | 1.367 | Launch | ||
262,144.0 tensor |
4 | $3.60 | 2.030 | Launch | ||
262,144.0 |
1 | $3.83 | 1.857 | Launch | ||
262,144.0 |
1 | $4.11 | 2.213 | Launch | ||
262,144.0 tensor |
2 | $4.61 | 3.815 | Launch | ||
262,144.0 |
1 | $4.74 | 3.407 | Launch | ||
262,144.0 tensor |
2 | $9.40 | 6.910 | Launch | ||
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
262,144.0 tensor |
4 | $0.96 | 1.044 | Launch | ||
262,144.0 tensor |
4 | $1.26 | 1.048 | Launch | ||
262,144.0 pipeline |
3 | $1.34 | 67.960 | 1.350 | Launch | |
262,144.0 tensor |
4 | $1.57 | 1.855 | Launch | ||
262,144.0 pipeline |
3 | $2.29 | 1.431 | Launch | ||
262,144.0 tensor |
4 | $2.34 | 1.855 | Launch | ||
262,144.0 |
1 | $2.37 | 93.860 | 1.789 | Launch | |
262,144.0 pipeline |
3 | $2.83 | 1.428 | Launch | ||
262,144.0 tensor |
4 | $2.89 | 1.964 | Launch | ||
262,144.0 tensor |
2 | $2.93 | 1.297 | Launch | ||
262,144.0 tensor |
4 | $3.60 | 1.960 | Launch | ||
262,144.0 |
1 | $3.83 | 126.670 | 1.787 | Launch | |
262,144.0 |
1 | $4.11 | 2.143 | Launch | ||
262,144.0 tensor |
2 | $4.61 | 3.745 | Launch | ||
262,144.0 |
1 | $4.74 | 3.336 | Launch | ||
262,144.0 tensor |
2 | $9.40 | 6.840 | Launch | ||
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
262,144.0 pipeline |
3 | $1.34 | 1.270 | Launch | ||
262,144.0 tensor |
4 | $1.57 | 1.776 | Launch | ||
262,144.0 pipeline |
6 | $1.65 | 1.575 | Launch | ||
262,144.0 pipeline |
3 | $2.29 | 1.351 | Launch | ||
262,144.0 tensor |
4 | $2.34 | 1.776 | Launch | ||
262,144.0 |
1 | $2.37 | 74.840 | 1.709 | Launch | |
262,144.0 pipeline |
3 | $2.83 | 1.348 | Launch | ||
262,144.0 tensor |
4 | $2.89 | 1.884 | Launch | ||
262,144.0 tensor |
2 | $2.93 | 1.217 | Launch | ||
262,144.0 tensor |
4 | $3.60 | 1.880 | Launch | ||
262,144.0 |
1 | $3.83 | 106.830 | 1.707 | Launch | |
262,144.0 |
1 | $4.11 | 2.063 | Launch | ||
262,144.0 tensor |
2 | $4.61 | 3.665 | Launch | ||
262,144.0 |
1 | $4.74 | 3.257 | Launch | ||
262,144.0 tensor |
2 | $9.40 | 6.760 | Launch | ||
Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.