Nemotron-3 Nano-30B is a new-generation LLM from NVIDIA. The model's key feature is its innovative architecture, which integrates Mamba2 layers, Transformer layers, and Mixture-of-Experts (MoE) technology into a unified compute cluster. This structure allows the model to efficiently process massive datasets while maintaining logical coherence and high throughput. The model has a total parameter count of 32 billion, but thanks to MoE routing, only an active subset of approximately 3.5 billion parameters is engaged for generating each individual token. This provides a unique balance: the model possesses the "knowledge" and capacity of a 30B-scale network but consumes computational resources on par with compact models optimized for fast inference. The model was trained on a dataset of about 25 trillion tokens, encompassing 43 programming languages and more than 19 natural languages.
Compared to Nemotron v2, the new version offers an MoE architecture instead of a dense one, delivering 4 times greater throughput. Another key capability of Nemotron-3 Nano is support for a context window of up to 1 million tokens. This expansion ideally showcases the capabilities of Mamba2 layers, which process long sequences with minimal memory overhead. A crucial stage in the model's creation was Multi-environment Reinforcement Learning using the NeMo Gym library. The model was trained not just to answer questions, but to perform action sequences: calling tools, writing functional code, and constructing multi-step plans. This makes its behavior more predictable and reliable in complex scenarios where step-by-step result verification is required.
On the AIME25 benchmark (American Invitational Mathematics Examination), which tests mathematical and quantitative reasoning, Nemotron 3 Nano achieves 99.2% accuracy with tool use, surpassing GPT-OSS-20B at 98.7%. On LiveCodeBench (v6 2025-08–2025–05), the model scores 68.2%, outperforming Qwen3-30B (66.0%) and GPT-OSS-20B (61.0%). On other benchmarks, the model either leads or is on par with its counterparts.
Given its architectural advantages and NVIDIA's recommendations, the model is ideally suited for the following tasks: Agentic Systems and Orchestration, Long-Context RAG, Local/On-Prem and Edge Computing, Code Generation, and Data Structuring.
| Model Name | Context | Type | GPU | TPS | Status | Link |
|---|
There are no public endpoints for this model yet.
Rent your own physically dedicated instance with hourly or long-term monthly billing.
We recommend deploying private instances in the following scenarios:
| Name | vCPU | RAM, MB | Disk, GB | GPU | |||
|---|---|---|---|---|---|---|---|
262,144.0 |
16 | 32768 | 160 | 1 | $0.53 | Launch | |
262,144.0 tensor |
16 | 32768 | 160 | 2 | $0.54 | Launch | |
262,144.0 tensor |
16 | 32768 | 160 | 2 | $0.57 | Launch | |
262,144.0 pipeline |
12 | 24576 | 120 | 3 | $0.84 | Launch | |
262,144.0 |
16 | 24576 | 160 | 1 | $0.88 | Launch | |
262,144.0 tensor |
16 | 32768 | 160 | 4 | $1.12 | Launch | |
262,144.0 |
16 | 32768 | 160 | 1 | $1.15 | Launch | |
262,144.0 |
12 | 65536 | 160 | 1 | $1.20 | Launch | |
262,144.0 tensor |
16 | 65536 | 160 | 2 | $1.23 | Launch | |
262,144.0 pipeline |
16 | 65536 | 160 | 3 | $1.43 | Launch | |
262,144.0 |
16 | 65536 | 160 | 1 | $1.59 | Launch | |
262,144.0 tensor |
16 | 65536 | 160 | 4 | $1.82 | Launch | |
262,144.0 |
16 | 65536 | 160 | 1 | $2.37 | Launch | |
262,144.0 |
16 | 65536 | 160 | 1 | $3.83 | Launch | |
262,144.0 |
16 | 98304 | 160 | 1 | $4.11 | Launch | |
262,144.0 |
16 | 131072 | 160 | 1 | $4.74 | Launch | |
| Name | vCPU | RAM, MB | Disk, GB | GPU | |||
|---|---|---|---|---|---|---|---|
262,144.0 pipeline |
32 | 65536 | 160 | 3 | $0.88 | Launch | |
262,144.0 tensor |
16 | 65536 | 160 | 2 | $0.93 | Launch | |
262,144.0 tensor |
16 | 65536 | 160 | 4 | $0.96 | Launch | |
262,144.0 pipeline |
32 | 131072 | 160 | 3 | $1.06 | Launch | |
262,144.0 tensor |
16 | 65536 | 160 | 2 | $1.23 | Launch | |
262,144.0 tensor |
32 | 131072 | 160 | 4 | $1.26 | Launch | |
262,144.0 tensor |
16 | 65536 | 160 | 2 | $1.67 | Launch | |
262,144.0 tensor |
16 | 65536 | 160 | 2 | $2.19 | Launch | |
262,144.0 tensor |
16 | 65535 | 240 | 2 | $2.22 | Launch | |
262,144.0 |
16 | 65536 | 160 | 1 | $2.37 | Launch | |
262,144.0 tensor |
16 | 65536 | 160 | 2 | $2.93 | Launch | |
262,144.0 |
16 | 65536 | 160 | 1 | $3.83 | Launch | |
262,144.0 |
16 | 98304 | 160 | 1 | $4.11 | Launch | |
262,144.0 |
16 | 131072 | 160 | 1 | $4.74 | Launch | |
| Name | vCPU | RAM, MB | Disk, GB | GPU | |||
|---|---|---|---|---|---|---|---|
262,144.0 pipeline |
32 | 131072 | 160 | 6 | $1.65 | Launch | |
262,144.0 tensor |
16 | 131072 | 160 | 4 | $1.75 | Launch | |
262,144.0 tensor |
16 | 131072 | 160 | 4 | $2.34 | Launch | |
262,144.0 |
16 | 131072 | 160 | 1 | $2.50 | Launch | |
262,144.0 tensor |
16 | 98304 | 320 | 4 | $3.18 | Launch | |
262,144.0 pipeline |
64 | 262144 | 320 | 3 | $3.89 | Launch | |
262,144.0 |
16 | 131072 | 160 | 1 | $3.95 | Launch | |
262,144.0 |
16 | 98304 | 160 | 1 | $4.11 | Launch | |
262,144.0 tensor |
16 | 98304 | 320 | 4 | $4.22 | Launch | |
262,144.0 pipeline |
16 | 98304 | 160 | 3 | $4.34 | Launch | |
262,144.0 tensor |
32 | 98304 | 160 | 4 | $4.35 | Launch | |
262,144.0 |
16 | 131072 | 160 | 1 | $4.74 | Launch | |
262,144.0 tensor |
16 | 131072 | 160 | 4 | $5.74 | Launch | |
Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.