Qwen3-VL-30B-A3B-Instruct

multimodal

Qwen3-VL-30B-A3B-Instruct is a medium-sized multimodal model of the Qwen3-VL series, demonstrating advanced capabilities in the field of image, video and text comprehension. The model is based on a Mixture of Experts (MoE) architecture with 30 billion parameters, of which only 3 billion are actively used, which ensures high performance with relatively low computing costs. The architecture includes 48 layers, 128 experts (8 active), GQA attention with 32 query heads and 4 for keys and values. The key difference from the previous VL versions were three architectural innovations. Interleaved-MRoPE provides full frequency allocation in time, latitude, and altitude coordinates through enhanced positional embeddings, which is critical for understanding long-term video sequences. DeepStack technology combines the multilevel features of Vision Transformer to capture fine-grained details and enhance image alignment with text. The Text-Timestamp Alignment system is superior to the traditional T-RoPE, providing accurate event timestamps for enhanced temporal video modeling. These architectural solutions allow the model not only to "see" images or videos, but also to truly understand the visual world and its dynamics.

The model is able to work as a visual agent, recognizing elements of computer and mobile interfaces, understanding their functions, invoking tools, and performing complex automation tasks. Advanced visual coding features allow you to generate Draw.io Diagrams, HTML, CSS, and JavaScript code are directly based on image and video analysis, which opens up new horizons for automating web development. Advanced spatial perception includes the assessment of object positions, viewpoints, and occlusions, providing a stronger 2D and 3D spatial understanding of scenes. The technical characteristics of the model are impressive: native support for the context of 256K tokens with the ability to expand to 1M, which allows you to process entire books and videos lasting hours with full memorization and indexing by seconds. Advanced OCR supports 32 languages, is resistant to low light, blur and tilt, works better with rare and ancient characters, as well as improved processing of the structure of long documents and entity extraction.

The Qwen3-VL-30B-A3B-Instruct opens up wide possibilities for practical applications in various fields. Interface automation is becoming a reality thanks to the model's ability to recognize and interact with GUI elements of desktop and mobile applications, which allows the creation of intelligent bots to automate routine tasks. Web development gets a powerful tool for generating code directly from visual layouts or descriptions, significantly speeding up the prototyping process. Document analysis with advanced OCR makes the model indispensable for processing multilingual documentation, scanned forms, invoices, and spreadsheets in the financial and commercial fields. Processing video content for up to several hours with accurate time indexing opens up opportunities for creating video surveillance analysis systems, educational content, and media analytics.


Announce Date: 26.09.2025
Parameters: 32B
Experts: 128
Activated at inference: 3B
Context: 263K
Layers: 48
Attention Type: Full Attention
Developer: Qwen
Transformers Version: 4.57.0.dev0
License: Apache 2.0

Public endpoint

Use our pre-built public endpoints for free to test inference and explore Qwen3-VL-30B-A3B-Instruct capabilities. You can obtain an API access token on the token management page after registration and verification.
Model Name Context Type GPU Status Link
There are no public endpoints for this model yet.

Private server

Rent your own physically dedicated instance with hourly or long-term monthly billing.

We recommend deploying private instances in the following scenarios:

  • maximize endpoint performance,
  • enable full context for long sequences,
  • ensure top-tier security for data processing in an isolated, dedicated environment,
  • use custom weights, such as fine-tuned models or LoRA adapters.

Recommended server configurations for hosting Qwen3-VL-30B-A3B-Instruct

Prices:
Name GPU Price, hour TPS Max Concurrency
teslat4-2.16.32.160
8,192.0
tensor
2 $0.54 5.214 Launch
teslaa2-2.16.32.160
8,192.0
tensor
2 $0.57 5.312 Launch
rtx3090-1.16.24.160
8,192.0
1 $0.83 1.727 Launch
rtx2080ti-3.12.24.120
8,192.0
pipeline
3 $0.84 5.021 Launch
teslat4-3.32.64.160
65,563.0
pipeline
3 $0.88 2.467 Launch
teslaa10-2.16.64.160
65,563.0
tensor
2 $0.93 3.085 Launch
teslat4-4.16.64.160
262,144.0
tensor
4 $0.96 1.071 Launch
rtx4090-1.16.32.160
8,192.0
1 $1.02 1.678 Launch
teslaa2-3.32.128.160
65,563.0
pipeline
3 $1.06 2.485 Launch
rtx2080ti-4.16.32.160
65,563.0
tensor
4 $1.12 1.829 Launch
rtxa5000-2.16.64.160.nvlink
65,563.0
tensor
2 $1.23 3.085 Launch
teslaa2-4.32.128.160
262,144.0
tensor
4 $1.26 1.077 Launch
teslaa10-3.16.96.160
262,144.0
pipeline
3 $1.34 87.300 1.530 Launch
rtx3080-3.16.64.160
8,192.0
pipeline
3 $1.43 1.525 Launch
rtx3090-2.16.64.160
204,800.0
tensor
2 $1.56 1.092 Launch
teslaa10-4.12.48.160
262,144.0
tensor
4 $1.57 2.288 Launch
rtx5090-1.16.64.160
65,563.0
1 $1.59 1.410 Launch
rtx3080-4.16.64.160
65,563.0
tensor
4 $1.82 1.247 Launch
rtx4090-2.16.64.160
204,800.0
tensor
2 $1.92 1.088 Launch
rtx3090-3.16.96.160
262,144.0
pipeline
3 $2.29 1.652 Launch
rtxa5000-4.16.128.160.nvlink
262,144.0
tensor
4 $2.34 2.288 Launch
teslaa100-1.16.64.160
262,144.0
1 $2.37 110.200 2.189 Launch
rtx4090-3.16.96.160
262,144.0
pipeline
3 $2.83 1.647 Launch
rtx3090-4.16.64.160
262,144.0
tensor
4 $2.89 2.451 Launch
rtx5090-2.16.64.160
262,144.0
tensor
2 $2.93 1.450 Launch
rtx4090-4.16.64.160
262,144.0
tensor
4 $3.60 2.445 Launch
h100-1.16.64.160
262,144.0
1 $3.83 122.770 2.186 Launch
h100nvl-1.16.96.160
262,144.0
1 $4.11 2.719 Launch
teslaa100-2.24.96.160.nvlink
262,144.0
tensor
2 $4.61 5.123 Launch
h200-1.16.128.160
262,144.0
1 $4.74 4.510 Launch
h200-2.24.256.160.nvlink
262,144.0
tensor
2 $9.40 9.765 Launch
Prices:
Name GPU Price, hour TPS Max Concurrency
teslaa10-2.16.64.160
8,192.0
tensor
2 $0.93 5.577 Launch
teslat4-4.16.64.160
65,563.0
tensor
4 $0.96 1.893 Launch
rtxa5000-2.16.64.160.nvlink
8,192.0
tensor
2 $1.23 5.577 Launch
teslaa2-4.32.128.160
65,563.0
tensor
4 $1.26 1.918 Launch
teslaa10-3.16.96.160
204,800.0
pipeline
3 $1.34 1.194 Launch
rtx3090-2.16.64.160
65,563.0
tensor
2 $1.56 1.022 Launch
teslaa10-4.12.48.160
65,563.0
tensor
4 $1.57 6.761 Launch
teslaa10-4.16.64.160
262,144.0
tensor
4 $1.62 1.691 Launch
teslaa2-6.32.128.160
204,800.0
pipeline
6 $1.65 1.187 Launch
rtx4090-2.16.64.160
65,563.0
tensor
2 $1.92 1.010 Launch
rtx3090-3.16.96.160
262,144.0
pipeline
3 $2.29 1.055 Launch
rtxa5000-4.16.128.160.nvlink
262,144.0
tensor
4 $2.34 1.691 Launch
teslaa100-1.16.64.160
262,144.0
1 $2.37 95.270 1.591 Launch
rtx4090-3.16.96.160
262,144.0
pipeline
3 $2.83 1.050 Launch
rtx3090-4.16.64.160
262,144.0
tensor
4 $2.89 1.853 Launch
rtx5090-2.16.64.160
204,800.0
tensor
2 $2.93 1.092 Launch
rtx4090-4.16.64.160
262,144.0
tensor
4 $3.60 1.847 Launch
h100-1.16.64.160
262,144.0
1 $3.83 102.230 1.589 Launch
h100nvl-1.16.96.160
262,144.0
1 $4.11 2.122 Launch
rtx5090-3.16.96.160
262,144.0
pipeline
3 $4.34 1.950 Launch
teslaa100-2.24.96.160.nvlink
262,144.0
tensor
2 $4.61 4.525 Launch
h200-1.16.128.160
262,144.0
1 $4.74 3.912 Launch
rtx5090-4.16.128.160
262,144.0
tensor
4 $5.74 3.048 Launch
h200-2.24.256.160.nvlink
262,144.0
tensor
2 $9.40 9.167 Launch
Prices:
Name GPU Price, hour TPS Max Concurrency
dedicated-rtx3090-8.64.128.960-1
262,144.0
tensor
8 1.902 Launch
teslaa2-6.32.128.240
8,192.0
pipeline
6 $1.66 3.113 Launch
teslaa10-4.16.128.240
65,563.0
tensor
4 $1.76 1.784 Launch
teslaa100-1.16.64.240
8,192.0
1 $2.38 11.093 Launch
teslaa100-1.16.128.240
65,563.0
1 $2.51 1.386 Launch
rtx3090-4.16.64.240
8,192.0
tensor
4 $2.89 19.481 Launch
rtx3090-4.16.96.320
65,563.0
tensor
4 $2.97 2.434 Launch
rtx4090-4.16.64.240
8,192.0
tensor
4 $3.61 19.285 Launch
rtx4090-4.16.96.320
65,563.0
tensor
4 $3.68 2.410 Launch
h100-1.16.64.240
8,192.0
1 $3.83 11.007 Launch
h100-1.16.128.240
65,563.0
1 $3.96 1.375 Launch
h100nvl-1.16.96.240
204,800.0
1 $4.12 1.123 Launch
rtx5090-3.16.96.240
65,563.0
pipeline
3 $4.35 2.822 Launch
h200-1.16.128.240
262,144.0
1 $4.74 2.668 Launch
teslaa100-2.24.256.240
262,144.0
tensor
2 $4.93 79.170 3.280 Launch
teslaa100-2.24.256.320.nvlink
262,144.0
tensor
2 $4.94 3.280 Launch
rtx5090-4.16.128.320
262,144.0
tensor
4 $5.76 1.803 Launch
rtx4090-6.44.256.240
262,144.0
pipeline
6 $5.84 1.465 Launch
rtx4090-8.44.256.240
262,144.0
tensor
8 $7.52 1.896 Launch
h100-2.24.256.240
262,144.0
tensor
2 $7.85 106.820 3.275 Launch
h100nvl-2.24.192.240
262,144.0
tensor
2 $8.17 4.341 Launch
h200-2.24.256.240.nvlink
262,144.0
tensor
2 $9.41 7.923 Launch

Related models

Need help?

Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.