Products

Cloud servers

Cloud servers with per-second billing. Isolated resources will give maximum performance for your project.

GPU servers

Cloud servers with modern RTX and Tesla graphics accelerators for games, rendering, streaming, working with 3D graphics, and artificial intelligence.

Tesla H200

Tesla H100

RTX 5090

RTX 4090

RTX 3090

RTX 3080

Tesla A100

RTX A5000

Tesla A10

RTX 2080 Ti

Tesla A2

Tesla T4

Tesla V100

All GPU servers

CPU servers

The cloud servers with high-performance Intel Xeon Gold 2nd and 3rd generation CPU are available for 100% of the processor time.
SSD servers NVMe servers
All CPU servers

Dedicated servers

Rent a physically dedicated server for a long term with a monthly payment. Configure it using modern components: Intel Xeon Gold 2nd and 3rd generation processors, up to 10 of the latest RTX and Tesla video accelerators, and up to 8192 GB of RAM per server, SSD and NVMe disks for data centers.

Select a dedicated server

Marketplace

Use popular and modern applications as effective tools for organizing your project. Save time with pre-configured images that already have all the necessary components installed.

Forget about manually downloading and installing the software — just deploy a virtual server with a ready-made image.
Neural networks 3D CUDA Docker / NGC For games Windows images Linux images
All pre-installed images
Features
Prices
FAQ
Contact
Login

Qwen3-VL-30B-A3B-Instruct

multimodal

try online

Qwen3-VL-30B-A3B-Instruct is a medium-sized multimodal model of the Qwen3-VL series, demonstrating advanced capabilities in the field of image, video and text comprehension. The model is based on a Mixture of Experts (MoE) architecture with 30 billion parameters, of which only 3 billion are actively used, which ensures high performance with relatively low computing costs. The architecture includes 48 layers, 128 experts (8 active), GQA attention with 32 query heads and 4 for keys and values. The key difference from the previous VL versions were three architectural innovations. Interleaved-MRoPE provides full frequency allocation in time, latitude, and altitude coordinates through enhanced positional embeddings, which is critical for understanding long-term video sequences. DeepStack technology combines the multilevel features of Vision Transformer to capture fine-grained details and enhance image alignment with text. The Text-Timestamp Alignment system is superior to the traditional T-RoPE, providing accurate event timestamps for enhanced temporal video modeling. These architectural solutions allow the model not only to "see" images or videos, but also to truly understand the visual world and its dynamics.

The model is able to work as a visual agent, recognizing elements of computer and mobile interfaces, understanding their functions, invoking tools, and performing complex automation tasks. Advanced visual coding features allow you to generate Draw.io Diagrams, HTML, CSS, and JavaScript code are directly based on image and video analysis, which opens up new horizons for automating web development. Advanced spatial perception includes the assessment of object positions, viewpoints, and occlusions, providing a stronger 2D and 3D spatial understanding of scenes. The technical characteristics of the model are impressive: native support for the context of 256K tokens with the ability to expand to 1M, which allows you to process entire books and videos lasting hours with full memorization and indexing by seconds. Advanced OCR supports 32 languages, is resistant to low light, blur and tilt, works better with rare and ancient characters, as well as improved processing of the structure of long documents and entity extraction.

The Qwen3-VL-30B-A3B-Instruct opens up wide possibilities for practical applications in various fields. Interface automation is becoming a reality thanks to the model's ability to recognize and interact with GUI elements of desktop and mobile applications, which allows the creation of intelligent bots to automate routine tasks. Web development gets a powerful tool for generating code directly from visual layouts or descriptions, significantly speeding up the prototyping process. Document analysis with advanced OCR makes the model indispensable for processing multilingual documentation, scanned forms, invoices, and spreadsheets in the financial and commercial fields. Processing video content for up to several hours with accurate time indexing opens up opportunities for creating video surveillance analysis systems, educational content, and media analytics.

Announce Date: 26.09.2025
Parameters: 31.1B
Experts: 128
Activated at inference: 3B
Context: 263K
Layers: 48
Attention Type: Full Attention
VRAM requirements: 44.4 GB using 4 bits quantization
Developer: Qwen
Transformers Version: 4.57.0.dev0
License: Apache 2.0

Public endpoint

Use our pre-built public endpoints for free to test inference and explore Qwen3-VL-30B-A3B-Instruct capabilities. You can obtain an API access token on the token management page after registration and verification.

Model Name	Context	Type	GPU	TPS	Status	Link
QuantTrio/Qwen3-VL-30B-A3B-Instruct-AWQ	262,144.0	Public	3×RTX4090		AVAILABLE	chat

API access to Qwen3-VL-30B-A3B-Instruct endpoints

curl https://chat.immers.cloud/v1/endpoints/Qwen3-VL-30B-A3B-Instruct/generate/chat/completions \

-H "Content-Type: application/json" \

-H "Authorization: Bearer USER_API_KEY" \

-d '{"model": "Qwen3-VL-30B-A3B-Instruct", "messages": [ 

    {"role": "system", "content": "You are a helpful assistant."}, 

    {"role": "user", "content": "Say this is a test"} 

], "temperature": 0, "max_tokens": 150}'

$response = Invoke-WebRequest https://chat.immers.cloud/v1/endpoints/Qwen3-VL-30B-A3B-Instruct/generate/chat/completions `

-Method POST `

-Headers @{ 

    "Authorization" = "Bearer USER_API_KEY" 

    "Content-Type"  = "application/json" 

} `

-Body (@{ 

    model = "Qwen3-VL-30B-A3B-Instruct" 

    messages = @( 

        @{ role = "system"; content = "You are a helpful assistant." }, 

        @{ role = "user"; content = "Say this is a test" } 

) 

} | ConvertTo-Json) 

($response.Content | ConvertFrom-Json).choices[0].message.content

#!pip install OpenAI --upgrade 



from openai import OpenAI 



client = OpenAI( 

    api_key="USER_API_KEY", 

    base_url="https://chat.immers.cloud/v1/endpoints/Qwen3-VL-30B-A3B-Instruct/generate/", 

) 



chat_response = client.chat.completions.create( 

    model="Qwen3-VL-30B-A3B-Instruct", 

    messages=[ 

        {"role": "system", "content": "You are a helpful assistant."}, 

        {"role": "user", "content": "Say this is a test"}, 

    ] 

) 

print(chat_response.choices[0].message.content)

Private server

Rent your own physically dedicated instance with hourly or long-term monthly billing.

We recommend deploying private instances in the following scenarios:

maximize endpoint performance,
enable full context for long sequences,
ensure top-tier security for data processing in an isolated, dedicated environment,
use custom weights, such as fine-tuned models or LoRA adapters.

Recommended configurations for hosting Qwen3-VL-30B-A3B-Instruct

Prices:

Name	vCPU	RAM, MB	Disk, GB	GPU	Price, hour	Price, month
teslaa10-1.16.32.160 8,192.0	16	32768	160	1	$0.53	$378.38	Launch
teslat4-2.16.32.160 65,563.0	16	32768	160	2	$0.54	$388.05	Launch
teslaa2-2.16.32.160 65,563.0	16	32768	160	2	$0.57	$413.85	Launch
rtx2080ti-3.12.24.120 8,192.0	12	24576	120	3	$0.84	$603.28	Launch
rtx3090-1.16.24.160 8,192.0	16	24576	160	1	$0.88	$633.02	Launch
teslaa10-2.16.64.160 204,800.0	16	65536	160	2	$0.93	$672.04	Launch
rtx2080ti-3.16.64.160 49,152.0	16	65536	160	3	$0.95	$680.90	Launch
teslat4-4.16.64.160 262,144.0	16	65536	160	4	$0.96	$691.38	Launch
rtx2080ti-4.16.32.160 65,563.0	16	32768	160	4	$1.12	$803.99	Launch
rtx4090-1.16.32.160 8,192.0	16	32768	160	1	$1.15	$830.60	Launch
teslav100-1.12.64.160 65,563.0	12	65536	160	1	$1.20	$867.11	Launch
rtxa5000-2.16.64.160.nvlink 204,800.0	16	65536	160	2	$1.23	$884.85	Launch
teslaa2-4.32.128.160 262,144.0	32	131072	160	4	$1.26	$904.76	Launch
teslaa10-3.16.96.160 262,144.0	16	98304	160	3	$1.34	$965.78	Launch
rtx3080-3.16.64.160 8,192.0	16	65536	160	3	$1.43	$1 026.72	Launch
rtx5090-1.16.64.160 65,563.0	16	65536	160	1	$1.59	$1 142.79	Launch
rtx3090-2.16.64.160 204,800.0	16	65536	160	2	$1.67	$1 204.06	Launch
rtx3080-4.16.64.160 65,563.0	16	65536	160	4	$1.82	$1 310.46	Launch
rtx4090-2.16.64.160 204,800.0	16	65536	160	2	$2.19	$1 576.47	Launch
teslav100-2.16.64.240 262,144.0	16	65535	240	2	$2.22	$1 600.41	Launch
rtxa5000-4.16.128.160.nvlink 262,144.0	16	131072	160	4	$2.34	$1 685.05	Launch
teslaa100-1.16.64.160 262,144.0	16	65536	160	1	$2.37	$1 707.06	Launch
rtx3090-3.16.96.160 262,144.0	16	98304	160	3	$2.45	$1 763.81	Launch
rtx5090-2.16.64.160 262,144.0	16	65536	160	2	$2.93	$2 110.10	Launch
rtx4090-3.16.96.160 262,144.0	16	98304	160	3	$3.23	$2 322.43	Launch
teslah100-1.16.64.160 262,144.0	16	65536	160	1	$3.83	$2 754.98	Launch
h200-1.16.128.160 262,144.0	16	131072	160	1	$4.74	$3 410.09	Launch

Prices:

Name	vCPU	RAM, MB	Disk, GB	GPU	Price, hour	Price, month
teslat4-3.32.64.160 8,192.0	32	65536	160	3	$0.88	$633.35	Launch
teslaa10-2.16.64.160 65,563.0	16	65536	160	2	$0.93	$672.04	Launch
teslat4-4.16.64.160 65,563.0	16	65536	160	4	$0.96	$691.38	Launch
teslaa2-3.32.128.160 8,192.0	32	131072	160	3	$1.06	$762.88	Launch
rtxa5000-2.16.64.160.nvlink 65,563.0	16	65536	160	2	$1.23	$884.85	Launch
teslaa2-4.32.128.160 65,563.0	32	131072	160	4	$1.26	$904.76	Launch
teslaa10-3.16.96.160 262,144.0	16	98304	160	3	$1.34	$965.78	Launch
teslaa2-6.32.128.160 262,144.0	32	131072	160	6	$1.65	$1 188.50	Launch
rtx3090-2.16.64.160 65,563.0	16	65536	160	2	$1.67	$1 204.06	Launch
rtx4090-2.16.64.160 65,563.0	16	65536	160	2	$2.19	$1 576.47	Launch
teslav100-2.16.64.240 204,800.0	16	65535	240	2	$2.22	$1 600.41	Launch
rtxa5000-4.16.128.160.nvlink 262,144.0	16	131072	160	4	$2.34	$1 685.05	Launch
teslaa100-1.16.64.160 262,144.0	16	65536	160	1	$2.37	$1 707.06	Launch
rtx3090-3.16.96.160 262,144.0	16	98304	160	3	$2.45	$1 763.81	Launch
rtx5090-2.16.64.160 204,800.0	16	65536	160	2	$2.93	$2 110.10	Launch
rtx4090-3.16.96.160 262,144.0	16	98304	160	3	$3.23	$2 322.43	Launch
teslah100-1.16.64.160 262,144.0	16	65536	160	1	$3.83	$2 754.98	Launch
teslav100-3.64.256.320 262,144.0	64	262144	320	3	$3.89	$2 801.33	Launch
rtx5090-3.16.96.160 262,144.0	16	98304	160	3	$4.34	$3 122.88	Launch
h200-1.16.128.160 262,144.0	16	131072	160	1	$4.74	$3 410.09	Launch

Prices:

Name	vCPU	RAM, MB	Disk, GB	GPU	Price, hour	Price, month
teslaa2-6.32.128.160 65,563.0	32	131072	160	6	$1.65	$1 188.50	Launch
teslaa10-4.16.128.160 65,563.0	16	131072	160	4	$1.75	$1 259.44	Launch
rtxa5000-4.16.128.160.nvlink 65,563.0	16	131072	160	4	$2.34	$1 685.05	Launch
teslaa100-1.16.128.160 65,563.0	16	131072	160	1	$2.50	$1 797.90	Launch
rtx3090-4.16.96.320 65,563.0	16	98304	320	4	$3.18	$2 290.59	Launch
rtxa5000-6.24.192.160.nvlink 262,144.0	24	196608	160	6	$3.50	$2 520.64	Launch
teslav100-3.64.256.320 65,563.0	64	262144	320	3	$3.89	$2 801.33	Launch
teslah100-1.16.128.160 65,563.0	16	131072	160	1	$3.95	$2 845.82	Launch
rtx4090-4.16.96.320 65,563.0	16	98304	320	4	$4.22	$3 035.41	Launch
rtx5090-3.16.96.160 65,563.0	16	98304	160	3	$4.34	$3 122.88	Launch
teslav100-4.32.96.160 262,144.0	32	98304	160	4	$4.35	$3 129.32	Launch
teslaa100-2.24.96.160.nvlink 262,144.0	24	98304	160	2	$4.61	$3 319.56	Launch
h200-1.16.128.160 262,144.0	16	131072	160	1	$4.74	$3 410.09	Launch
rtx5090-4.16.128.160 262,144.0	16	131072	160	4	$5.74	$4 135.57	Launch
rtx4090-6.44.256.160 262,144.0	44	262144	160	6	$6.63	$4 775.04	Launch
teslah100-2.24.256.160 262,144.0	24	262144	160	2	$7.84	$5 642.39	Launch

Related models

Qwen3-30B-A3B

Qwen3-235B-A22B

Qwen3-0.6B

Qwen3-1.7B

Qwen3-4B

Qwen3-8B

Qwen3-14B

Qwen3-32B

DeepSeek-R1-0528-Qwen3-8B

T-pro-2.0

Qwen3-235B-A22B-Instruct-2507

Qwen3-235B-A22B-Thinking-2507

Qwen3-Coder-480B-A35B-Instruct

Qwen3-Coder-30B-A3B-Instruct

Qwen3-30B-A3B-Instruct-2507

Qwen3-30B-A3B-Thinking-2507

Qwen3-4B-Instruct-2507

Qwen3-4B-Thinking-2507

Qwen3-VL-235B-A22B-Instruct

Qwen3-VL-235B-A22B-Thinking

Qwen3-VL-30B-A3B-Thinking

Qwen3-VL-8B-Instruct

Qwen3-VL-8B-Thinking

Qwen3-VL-4B-Instruct

Qwen3-VL-4B-Thinking

Qwen3-VL-2B-Instruct

Qwen3-VL-2B-Thinking

Qwen3-VL-32B-Instruct

Qwen3-VL-32B-Thinking

Need help?

Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.