Qwen3-VL-30B-A3B-Instruct

multimodal

Qwen3-VL-30B-A3B-Instruct is a medium-sized multimodal model of the Qwen3-VL series, demonstrating advanced capabilities in the field of image, video and text comprehension. The model is based on a Mixture of Experts (MoE) architecture with 30 billion parameters, of which only 3 billion are actively used, which ensures high performance with relatively low computing costs. The architecture includes 48 layers, 128 experts (8 active), GQA attention with 32 query heads and 4 for keys and values. The key difference from the previous VL versions were three architectural innovations. Interleaved-MRoPE provides full frequency allocation in time, latitude, and altitude coordinates through enhanced positional embeddings, which is critical for understanding long-term video sequences. DeepStack technology combines the multilevel features of Vision Transformer to capture fine-grained details and enhance image alignment with text. The Text-Timestamp Alignment system is superior to the traditional T-RoPE, providing accurate event timestamps for enhanced temporal video modeling. These architectural solutions allow the model not only to "see" images or videos, but also to truly understand the visual world and its dynamics.

The model is able to work as a visual agent, recognizing elements of computer and mobile interfaces, understanding their functions, invoking tools, and performing complex automation tasks. Advanced visual coding features allow you to generate Draw.io Diagrams, HTML, CSS, and JavaScript code are directly based on image and video analysis, which opens up new horizons for automating web development. Advanced spatial perception includes the assessment of object positions, viewpoints, and occlusions, providing a stronger 2D and 3D spatial understanding of scenes. The technical characteristics of the model are impressive: native support for the context of 256K tokens with the ability to expand to 1M, which allows you to process entire books and videos lasting hours with full memorization and indexing by seconds. Advanced OCR supports 32 languages, is resistant to low light, blur and tilt, works better with rare and ancient characters, as well as improved processing of the structure of long documents and entity extraction.

The Qwen3-VL-30B-A3B-Instruct opens up wide possibilities for practical applications in various fields. Interface automation is becoming a reality thanks to the model's ability to recognize and interact with GUI elements of desktop and mobile applications, which allows the creation of intelligent bots to automate routine tasks. Web development gets a powerful tool for generating code directly from visual layouts or descriptions, significantly speeding up the prototyping process. Document analysis with advanced OCR makes the model indispensable for processing multilingual documentation, scanned forms, invoices, and spreadsheets in the financial and commercial fields. Processing video content for up to several hours with accurate time indexing opens up opportunities for creating video surveillance analysis systems, educational content, and media analytics.


Announce Date: 26.09.2025
Parameters: 32B
Experts: 128
Activated at inference: 3B
Context: 263K
Layers: 48
Attention Type: Full Attention
Developer: Qwen
Transformers Version: 4.57.0.dev0
License: Apache 2.0

Public endpoint

Use our pre-built public endpoints for free to test inference and explore Qwen3-VL-30B-A3B-Instruct capabilities. You can obtain an API access token on the token management page after registration and verification.
Model Name Context Type GPU Status Link
There are no public endpoints for this model yet.

Private server

Rent your own physically dedicated instance with hourly or long-term monthly billing.

We recommend deploying private instances in the following scenarios:

  • maximize endpoint performance,
  • enable full context for long sequences,
  • ensure top-tier security for data processing in an isolated, dedicated environment,
  • use custom weights, such as fine-tuned models or LoRA adapters.

Recommended server configurations for hosting Qwen3-VL-30B-A3B-Instruct

Prices:
Name GPU Price, hour TPS Max Concurrency
teslat4-2.16.32.160
8,192.0
tensor
2 $0.54 4.708 Launch
teslaa2-2.16.32.160
8,192.0
tensor
2 $0.57 4.806 Launch
rtx3090-1.16.24.160
8,192.0
1 $0.83 2.288 Launch
rtx2080ti-3.12.24.120
8,192.0
pipeline
3 $0.84 3.449 Launch
teslat4-3.32.64.160
65,563.0
pipeline
3 $0.88 2.270 Launch
teslaa10-2.16.64.160
65,563.0
tensor
2 $0.93 3.022 Launch
teslat4-4.16.64.160
204,800.0
tensor
4 $0.96 1.265 Launch
rtx4090-1.16.32.160
8,192.0
1 $1.02 2.239 Launch
teslaa2-3.32.128.160
65,563.0
pipeline
3 $1.06 2.289 Launch
rtx2080ti-4.16.32.160
65,563.0
tensor
4 $1.12 1.500 Launch
rtxa5000-2.16.64.160.nvlink
65,563.0
tensor
2 $1.23 3.022 Launch
teslaa2-4.32.128.160
204,800.0
tensor
4 $1.26 1.273 Launch
teslaa10-3.16.96.160
262,144.0
pipeline
3 $1.34 87.300 1.481 Launch
rtx3090-2.16.64.160
204,800.0
tensor
2 $1.56 1.072 Launch
teslaa10-4.12.48.160
262,144.0
tensor
4 $1.57 2.206 Launch
rtx5090-1.16.64.160
65,563.0
1 $1.59 1.480 Launch
teslaa2-6.32.128.160
262,144.0
pipeline
6 $1.65 1.226 Launch
rtx3080-4.16.64.160
49,152.0
tensor
4 $1.82 1.223 Launch
rtx4090-2.16.64.160
204,800.0
tensor
2 $1.92 1.068 Launch
rtx3090-3.16.96.160
262,144.0
pipeline
3 $2.29 1.603 Launch
rtxa5000-4.16.128.160.nvlink
262,144.0
tensor
4 $2.34 2.206 Launch
teslaa100-1.16.64.160
262,144.0
1 $2.37 110.200 2.206 Launch
rtx4090-3.16.96.160
262,144.0
pipeline
3 $2.83 1.598 Launch
rtx3090-4.16.64.160
262,144.0
tensor
4 $2.89 2.368 Launch
rtx5090-2.16.64.160
262,144.0
tensor
2 $2.93 1.434 Launch
rtx4090-4.16.64.160
262,144.0
tensor
4 $3.60 2.362 Launch
h100-1.16.64.160
262,144.0
1 $3.83 122.770 2.204 Launch
h100nvl-1.16.96.160
262,144.0
1 $4.11 2.737 Launch
teslaa100-2.24.96.160.nvlink
262,144.0
tensor
2 $4.61 5.107 Launch
h200-1.16.128.160
262,144.0
1 $4.74 4.527 Launch
h200-2.24.256.160.nvlink
262,144.0
tensor
2 $9.40 9.749 Launch
Prices:
Name GPU Price, hour TPS Max Concurrency
teslaa10-2.16.64.160
49,152.0
tensor
2 $0.93 1.058 Launch
teslat4-4.16.64.160
65,563.0
tensor
4 $0.96 1.723 Launch
rtxa5000-2.16.64.160.nvlink
49,152.0
tensor
2 $1.23 1.058 Launch
teslaa2-4.32.128.160
65,563.0
tensor
4 $1.26 1.748 Launch
teslaa10-3.16.96.160
204,800.0
pipeline
3 $1.34 1.182 Launch
rtx3090-2.16.64.160
65,563.0
tensor
2 $1.56 1.118 Launch
teslaa10-4.12.48.160
65,563.0
tensor
4 $1.57 6.591 Launch
teslaa10-4.16.64.160
262,144.0
tensor
4 $1.62 1.648 Launch
teslaa2-6.32.128.160
204,800.0
pipeline
6 $1.65 1.094 Launch
rtx4090-2.16.64.160
65,563.0
tensor
2 $1.92 1.106 Launch
rtx3090-3.16.96.160
262,144.0
pipeline
3 $2.29 1.045 Launch
rtxa5000-4.16.128.160.nvlink
262,144.0
tensor
4 $2.34 1.648 Launch
teslaa100-1.16.64.160
262,144.0
1 $2.37 95.270 1.649 Launch
rtx4090-3.16.96.160
262,144.0
pipeline
3 $2.83 1.041 Launch
rtx3090-4.16.64.160
262,144.0
tensor
4 $2.89 1.811 Launch
rtx5090-2.16.64.160
204,800.0
tensor
2 $2.93 1.123 Launch
rtx4090-4.16.64.160
262,144.0
tensor
4 $3.60 1.805 Launch
h100-1.16.64.160
262,144.0
1 $3.83 102.230 1.646 Launch
h100nvl-1.16.96.160
262,144.0
1 $4.11 2.179 Launch
rtx5090-3.16.96.160
262,144.0
pipeline
3 $4.34 1.941 Launch
teslaa100-2.24.96.160.nvlink
262,144.0
tensor
2 $4.61 4.549 Launch
h200-1.16.128.160
262,144.0
1 $4.74 3.970 Launch
rtx5090-4.16.128.160
262,144.0
tensor
4 $5.74 3.005 Launch
h200-2.24.256.160.nvlink
262,144.0
tensor
2 $9.40 9.191 Launch
Prices:
Name GPU Price, hour TPS Max Concurrency
dedicated-rtx3090-8.64.128.960-1
262,144.0
tensor
8 1.857 Launch
teslaa10-4.16.64.160
65,563.0
tensor
4 $1.62 1.954 Launch
teslaa2-6.32.128.160
8,192.0
pipeline
6 $1.65 2.594 Launch
rtxa5000-4.16.128.160.nvlink
65,563.0
tensor
4 $2.34 1.954 Launch
teslaa100-1.16.64.160
65,563.0
1 $2.37 1.955 Launch
rtx3090-4.16.64.160
65,563.0
tensor
4 $2.89 2.603 Launch
rtxa5000-6.24.192.160.nvlink
262,144.0
pipeline
6 $3.50 1.292 Launch
rtx4090-4.16.64.160
65,563.0
tensor
4 $3.60 2.579 Launch
h100-1.16.64.160
65,563.0
1 $3.83 1.944 Launch
h100nvl-1.16.96.160
262,144.0
1 $4.11 1.020 Launch
rtx5090-3.16.96.160
204,800.0
pipeline
3 $4.34 1.000 Launch
teslaa100-2.24.96.160.nvlink
262,144.0
tensor
2 $4.61 3.389 Launch
rtxa5000-8.24.256.160.nvlink
262,144.0
tensor
8 $4.61 1.694 Launch
h200-1.16.128.160
262,144.0
1 $4.74 2.810 Launch
teslaa100-2.24.256.160
262,144.0
tensor
2 $4.93 79.170 3.389 Launch
rtx5090-4.16.128.160
262,144.0
tensor
4 $5.74 1.846 Launch
rtx4090-6.44.256.160
262,144.0
pipeline
6 $5.83 1.449 Launch
rtx4090-8.44.256.160
262,144.0
tensor
8 $7.51 1.851 Launch
h100-2.24.256.160
262,144.0
tensor
2 $7.84 106.820 3.384 Launch
h200-2.24.256.160.nvlink
262,144.0
tensor
2 $9.40 8.032 Launch

Related models

Need help?

Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.