Nemotron-3 Nano-30B is a new-generation LLM from NVIDIA. The model's key feature is its innovative architecture, which integrates Mamba2 layers, Transformer layers, and Mixture-of-Experts (MoE) technology into a unified compute cluster. This structure allows the model to efficiently process massive datasets while maintaining logical coherence and high throughput. The model has a total parameter count of 32 billion, but thanks to MoE routing, only an active subset of approximately 3.5 billion parameters is engaged for generating each individual token. This provides a unique balance: the model possesses the "knowledge" and capacity of a 30B-scale network but consumes computational resources on par with compact models optimized for fast inference. The model was trained on a dataset of about 25 trillion tokens, encompassing 43 programming languages and more than 19 natural languages.
Compared to Nemotron v2, the new version offers an MoE architecture instead of a dense one, delivering 4 times greater throughput. Another key capability of Nemotron-3 Nano is support for a context window of up to 1 million tokens. This expansion ideally showcases the capabilities of Mamba2 layers, which process long sequences with minimal memory overhead. A crucial stage in the model's creation was Multi-environment Reinforcement Learning using the NeMo Gym library. The model was trained not just to answer questions, but to perform action sequences: calling tools, writing functional code, and constructing multi-step plans. This makes its behavior more predictable and reliable in complex scenarios where step-by-step result verification is required.
On the AIME25 benchmark (American Invitational Mathematics Examination), which tests mathematical and quantitative reasoning, Nemotron 3 Nano achieves 99.2% accuracy with tool use, surpassing GPT-OSS-20B at 98.7%. On LiveCodeBench (v6 2025-08–2025–05), the model scores 68.2%, outperforming Qwen3-30B (66.0%) and GPT-OSS-20B (61.0%). On other benchmarks, the model either leads or is on par with its counterparts.
Given its architectural advantages and NVIDIA's recommendations, the model is ideally suited for the following tasks: Agentic Systems and Orchestration, Long-Context RAG, Local/On-Prem and Edge Computing, Code Generation, and Data Structuring.
| Model Name | Context | Type | GPU | TPS | Tooling | Status | Link |
|---|---|---|---|---|---|---|---|
| stelterlab/NVIDIA-Nemotron-3-Nano-30B-A3B-AWQ | 262,144.0 | Public | RTX4090 | 463.00 | yes | AVAILABLE | chat |
curl https://chat.immers.cloud/v1/endpoints/nemotron3-nano-30b-a3b/generate/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer USER_API_KEY" \
--data-binary @- <<"EOF"
{"model": "NVIDIA-Nemotron-3-Nano-30B-A3B", "messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Say this is a test"}
], "temperature": 0, "max_tokens": 150
}
EOF
$response = Invoke-WebRequest https://chat.immers.cloud/v1/endpoints/nemotron3-nano-30b-a3b/generate/chat/completions `
-Method POST `
-Headers @{
"Authorization" = "Bearer USER_API_KEY"
"Content-Type" = "application/json; charset=utf-8"
} `
-Body ([System.Text.Encoding]::UTF8.GetBytes((@{
model = "NVIDIA-Nemotron-3-Nano-30B-A3B"
messages = @(
@{ role = "system"; content = "You are a helpful assistant." },
@{ role = "user"; content = "Say this is a test" })
} | ConvertTo-Json -Depth 10)))
([System.Text.Encoding]::UTF8.GetString($response.RawContentStream.ToArray()) | ConvertFrom-Json).choices[0].message.content
#!pip install OpenAI --upgrade
from openai import OpenAI
client = OpenAI(
api_key="USER_API_KEY",
base_url="https://chat.immers.cloud/v1/endpoints/nemotron3-nano-30b-a3b/generate/",
)
chat_response = client.chat.completions.create(
model="NVIDIA-Nemotron-3-Nano-30B-A3B",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Say this is a test"},
]
)
print(chat_response.choices[0].message.content)
Rent your own physically dedicated instance with hourly or long-term monthly billing.
We recommend deploying private instances in the following scenarios:
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
262,144.0 tensor |
2 | $0.54 | 2.312 | Launch | ||
262,144.0 tensor |
2 | $0.57 | 2.360 | Launch | ||
262,144.0 |
1 | $0.83 | 1.138 | Launch | ||
262,144.0 tensor |
2 | $0.93 | 11.764 | Launch | ||
262,144.0 |
1 | $1.02 | 1.114 | Launch | ||
262,144.0 tensor |
4 | $1.12 | 2.926 | Launch | ||
262,144.0 tensor |
2 | $1.23 | 11.764 | Launch | ||
262,144.0 |
1 | $1.59 | 5.775 | Launch | ||
262,144.0 tensor |
4 | $1.82 | 1.795 | Launch | ||
262,144.0 |
1 | $2.37 | 181.920 | 34.286 | Launch | |
262,144.0 |
1 | $3.83 | 34.244 | Launch | ||
262,144.0 |
1 | $4.11 | 42.524 | Launch | ||
262,144.0 tensor |
2 | $4.61 | 79.323 | Launch | ||
262,144.0 |
1 | $4.74 | 70.327 | Launch | ||
262,144.0 tensor |
2 | $9.40 | 151.405 | Launch | ||
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
262,144.0 tensor |
2 | $0.93 | 2.822 | Launch | ||
262,144.0 tensor |
4 | $0.96 | 3.217 | Launch | ||
262,144.0 tensor |
2 | $1.23 | 2.822 | Launch | ||
262,144.0 tensor |
4 | $1.26 | 3.264 | Launch | ||
262,144.0 tensor |
2 | $1.56 | 4.084 | Launch | ||
262,144.0 tensor |
2 | $1.92 | 4.037 | Launch | ||
262,144.0 |
1 | $2.37 | 25.344 | Launch | ||
262,144.0 tensor |
2 | $2.93 | 13.358 | Launch | ||
262,144.0 |
1 | $3.83 | 134.650 | 25.302 | Launch | |
262,144.0 |
1 | $4.11 | 33.582 | Launch | ||
262,144.0 tensor |
2 | $4.61 | 70.381 | Launch | ||
262,144.0 |
1 | $4.74 | 61.385 | Launch | ||
262,144.0 tensor |
2 | $9.40 | 142.463 | Launch | ||
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
262,144.0 tensor |
4 | $1.62 | 3.488 | Launch | ||
262,144.0 tensor |
4 | $2.34 | 3.488 | Launch | ||
262,144.0 |
1 | $2.37 | 6.981 | Launch | ||
262,144.0 tensor |
4 | $2.89 | 4.749 | Launch | ||
262,144.0 tensor |
4 | $3.60 | 4.702 | Launch | ||
262,144.0 |
1 | $3.83 | 6.940 | Launch | ||
262,144.0 |
1 | $4.11 | 15.219 | Launch | ||
262,144.0 pipeline |
3 | $4.34 | 6.705 | Launch | ||
262,144.0 tensor |
2 | $4.61 | 52.018 | Launch | ||
262,144.0 |
1 | $4.74 | 43.023 | Launch | ||
262,144.0 tensor |
4 | $5.74 | 14.023 | Launch | ||
262,144.0 tensor |
2 | $9.40 | 124.101 | Launch | ||
Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.