Qwen/Qwen3-VL-30B-A3B-Instruct

multimodal

Qwen3-VL-30B-A3B-Instruct is a medium-sized multimodal model of the Qwen3-VL series, demonstrating advanced capabilities in the field of image, video and text comprehension. The model is based on a Mixture of Experts (MoE) architecture with 30 billion parameters, of which only 3 billion are actively used, which ensures high performance with relatively low computing costs. The architecture includes 48 layers, 128 experts (8 active), GQA attention with 32 query heads and 4 for keys and values. The key difference from the previous VL versions were three architectural innovations. Interleaved-MRoPE provides full frequency allocation in time, latitude, and altitude coordinates through enhanced positional embeddings, which is critical for understanding long-term video sequences. DeepStack technology combines the multilevel features of Vision Transformer to capture fine-grained details and enhance image alignment with text. The Text-Timestamp Alignment system is superior to the traditional T-RoPE, providing accurate event timestamps for enhanced temporal video modeling. These architectural solutions allow the model not only to "see" images or videos, but also to truly understand the visual world and its dynamics.

The model is able to work as a visual agent, recognizing elements of computer and mobile interfaces, understanding their functions, invoking tools, and performing complex automation tasks. Advanced visual coding features allow you to generate Draw.io Diagrams, HTML, CSS, and JavaScript code are directly based on image and video analysis, which opens up new horizons for automating web development. Advanced spatial perception includes the assessment of object positions, viewpoints, and occlusions, providing a stronger 2D and 3D spatial understanding of scenes. The technical characteristics of the model are impressive: native support for the context of 256K tokens with the ability to expand to 1M, which allows you to process entire books and videos lasting hours with full memorization and indexing by seconds. Advanced OCR supports 32 languages, is resistant to low light, blur and tilt, works better with rare and ancient characters, as well as improved processing of the structure of long documents and entity extraction.

The Qwen3-VL-30B-A3B-Instruct opens up wide possibilities for practical applications in various fields. Interface automation is becoming a reality thanks to the model's ability to recognize and interact with GUI elements of desktop and mobile applications, which allows the creation of intelligent bots to automate routine tasks. Web development gets a powerful tool for generating code directly from visual layouts or descriptions, significantly speeding up the prototyping process. Document analysis with advanced OCR makes the model indispensable for processing multilingual documentation, scanned forms, invoices, and spreadsheets in the financial and commercial fields. Processing video content for up to several hours with accurate time indexing opens up opportunities for creating video surveillance analysis systems, educational content, and media analytics.


Announce Date: 26.09.2025
Parameters: 31.1B
Experts: 128
Activated at inference: 3B
Context: 263K
Layers: 48
Attention Type: Full Attention
VRAM requirements: 41.0 GB using 4 bits quantization
Developer: Qwen
Transformers Version: 4.57.0.dev0
License: Apache 2.0

Public endpoint

Use our pre-built public endpoints for free to test inference and explore Qwen/Qwen3-VL-30B-A3B-Instruct capabilities. You can obtain an API access token on the token management page after registration and verification.
Model Name Context Type GPU TPS Status Link
There are no public endpoints for this model yet.

Private server

Rent your own physically dedicated instance with hourly or long-term monthly billing.

We recommend deploying private instances in the following scenarios:

  • maximize endpoint performance,
  • enable full context for long sequences,
  • ensure top-tier security for data processing in an isolated, dedicated environment,
  • use custom weights, such as fine-tuned models or LoRA adapters.

Recommended configurations for hosting Qwen/Qwen3-VL-30B-A3B-Instruct

Prices:
Name vCPU RAM, MB Disk, GB GPU Price, hour
teslaa10-2.16.64.160
262,144.0
16 65536 160 2 $0.93 Launch
teslat4-4.16.64.160
262,144.0
16 65536 160 4 $0.96 Launch
rtxa5000-2.16.64.160.nvlink
262,144.0
16 65536 160 2 $1.23 Launch
teslaa2-4.32.128.160
262,144.0
32 131072 160 4 $1.26 Launch
rtx3090-2.16.64.160
262,144.0
16 65536 160 2 $1.67 Launch
rtx4090-2.16.64.160
262,144.0
16 65536 160 2 $2.19 Launch
teslav100-2.16.64.240
262,144.0
16 65535 240 2 $2.22 Launch
teslaa100-1.16.64.160
262,144.0
16 65536 160 1 $2.58 Launch
rtx5090-2.16.64.160
262,144.0
16 65536 160 2 $2.93 Launch
teslah100-1.16.64.160
262,144.0
16 65536 160 1 $5.11 Launch
h200-1.16.128.160
262,144.0
16 131072 160 1 $6.98 Launch
Prices:
Name vCPU RAM, MB Disk, GB GPU Price, hour
teslaa10-3.16.96.160
262,144.0
16 98304 160 3 $1.34 Launch
teslaa2-6.32.128.160
262,144.0
32 131072 160 6 $1.65 Launch
rtxa5000-4.16.128.160.nvlink
262,144.0
16 131072 160 4 $2.34 Launch
rtx3090-3.16.96.160
262,144.0
16 98304 160 3 $2.45 Launch
teslaa100-1.16.64.160
262,144.0
16 65536 160 1 $2.58 Launch
rtx4090-3.16.96.160
262,144.0
16 98304 160 3 $3.23 Launch
teslav100-3.64.256.320
262,144.0
64 262144 320 3 $3.89 Launch
rtx5090-3.16.96.160
262,144.0
16 98304 160 3 $4.34 Launch
teslah100-1.16.64.160
262,144.0
16 65536 160 1 $5.11 Launch
h200-1.16.128.160
262,144.0
16 131072 160 1 $6.98 Launch
Prices:
Name vCPU RAM, MB Disk, GB GPU Price, hour
rtxa5000-6.24.192.160.nvlink
262,144.0
24 196608 160 6 $3.50 Launch
teslav100-4.32.96.160
262,144.0
32 98304 160 4 $4.35 Launch
teslaa100-2.24.96.160.nvlink
262,144.0
24 98304 160 2 $5.04 Launch
rtx5090-4.16.128.160
262,144.0
16 131072 160 4 $5.74 Launch
rtx4090-6.44.256.160
262,144.0
44 262144 160 6 $6.63 Launch
h200-1.16.128.160
262,144.0
16 131072 160 1 $6.98 Launch
teslah100-2.24.256.160
262,144.0
24 262144 160 2 $10.40 Launch

Related models

Need help?

Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.