Qwen3-VL-4B-Instruct is a compact 4-billion-parameter multimodal model designed for efficient deployment on resource-constrained servers while retaining the full functionality of the Qwen3-VL series. Despite being half the size of the 8B version, the model preserves all key architectural innovations: Interleaved-MRoPE for video understanding, DeepStack for multi-level visual feature fusion, and Text-Timestamp Alignment for precise temporal localization. The seamless integration of text and visual modalities provides an understanding of multimodal context at a level comparable to pure-text LLMs.
In terms of performance, Qwen3-VL-4B-Instruct approaches the results of Qwen2.5-VL-7B, demonstrating that the reduction in model size was achieved without significant loss of quality. The model supports a native context of 256K tokens (expandable to 1M), enabling the processing of long documents, multi-hour videos, and complex multimodal dialogues. Advanced OCR capabilities with support for 32 languages and resilience to challenging shooting conditions make the 4B model a full-fledged solution for intelligent document processing tasks, despite its compact size.
Qwen3-VL-4B-Instruct represents an ideal solution for scenarios requiring a balance between performance and efficiency: deployment on consumer devices, the ability to process large volumes of visual content, fast response times for integration into real-time applications, and research projects. Furthermore, the open Apache 2.0 license allows for free commercial use of the model, making it accessible to a wide range of users, from startups to large enterprises.
| Model Name | Context | Type | GPU | TPS | Status | Link |
|---|
There are no public endpoints for this model yet.
Rent your own physically dedicated instance with hourly or long-term monthly billing.
We recommend deploying private instances in the following scenarios:
| Name | vCPU | RAM, MB | Disk, GB | GPU | |||
|---|---|---|---|---|---|---|---|
262,144.0 |
16 | 65536 | 160 | 4 | $0.96 | Launch | |
262,144.0 |
32 | 131072 | 160 | 4 | $1.26 | Launch | |
262,144.0 |
16 | 98304 | 160 | 3 | $1.34 | Launch | |
262,144.0 |
16 | 65535 | 240 | 2 | $2.22 | Launch | |
262,144.0 |
16 | 131072 | 160 | 4 | $2.34 | Launch | |
262,144.0 |
16 | 98304 | 160 | 3 | $2.45 | Launch | |
262,144.0 |
16 | 65536 | 160 | 1 | $2.58 | Launch | |
262,144.0 |
16 | 65536 | 160 | 2 | $2.93 | Launch | |
262,144.0 |
16 | 98304 | 160 | 3 | $3.23 | Launch | |
262,144.0 |
16 | 65536 | 160 | 1 | $5.11 | Launch | |
262,144.0 |
16 | 131072 | 160 | 1 | $6.98 | Launch | |
| Name | vCPU | RAM, MB | Disk, GB | GPU | |||
|---|---|---|---|---|---|---|---|
262,144.0 |
16 | 65536 | 160 | 4 | $0.96 | Launch | |
262,144.0 |
32 | 131072 | 160 | 4 | $1.26 | Launch | |
262,144.0 |
16 | 98304 | 160 | 3 | $1.34 | Launch | |
262,144.0 |
16 | 65535 | 240 | 2 | $2.22 | Launch | |
262,144.0 |
16 | 131072 | 160 | 4 | $2.34 | Launch | |
262,144.0 |
16 | 98304 | 160 | 3 | $2.45 | Launch | |
262,144.0 |
16 | 65536 | 160 | 1 | $2.58 | Launch | |
262,144.0 |
16 | 65536 | 160 | 2 | $2.93 | Launch | |
262,144.0 |
16 | 98304 | 160 | 3 | $3.23 | Launch | |
262,144.0 |
16 | 65536 | 160 | 1 | $5.11 | Launch | |
262,144.0 |
16 | 131072 | 160 | 1 | $6.98 | Launch | |
| Name | vCPU | RAM, MB | Disk, GB | GPU | |||
|---|---|---|---|---|---|---|---|
262,144.0 |
16 | 65536 | 160 | 4 | $0.96 | Launch | |
262,144.0 |
32 | 131072 | 160 | 4 | $1.26 | Launch | |
262,144.0 |
16 | 98304 | 160 | 3 | $1.34 | Launch | |
262,144.0 |
16 | 65535 | 240 | 2 | $2.22 | Launch | |
262,144.0 |
16 | 131072 | 160 | 4 | $2.34 | Launch | |
262,144.0 |
16 | 98304 | 160 | 3 | $2.45 | Launch | |
262,144.0 |
16 | 65536 | 160 | 1 | $2.58 | Launch | |
262,144.0 |
16 | 65536 | 160 | 2 | $2.93 | Launch | |
262,144.0 |
16 | 98304 | 160 | 3 | $3.23 | Launch | |
262,144.0 |
16 | 65536 | 160 | 1 | $5.11 | Launch | |
262,144.0 |
16 | 131072 | 160 | 1 | $6.98 | Launch | |
Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.