Ministral-3-3B-Instruct-2512 is the lightest multimodal model in the Ministral 3 lineup, designed specifically for operation on devices with minimal computational resources. Its architecture consists of a text LLM with 3.4 billion parameters and a visual encoder with 0.4 billion parameters. Despite its compact size, the model supports a context window of 256,000 tokens and more than 10 languages.
The model's efficiency stems from the Cascade Distillation method: knowledge from the parent model Mistral Small 3.1 (24B) is transferred through iterative pruning and distillation. Even with a sevenfold reduction in parameters, the model retains a significant portion of its teacher's capabilities. The visual encoder ViT (410M) is frozen during training, while multimodal understanding is achieved via a trainable adapter—minimizing computational costs while preserving image recognition quality. In benchmarks, the model demonstrates competitive results for its class. On Arena Hard (instruction following), it achieves a score of 0.305, and on WildBench (dialog skills), it reaches 56.8. The MATH Maj@1 benchmark yields 0.830, performance comparable to larger models.
Developers recommend using a temperature of 0.1 for most scenarios that do not require creativity. The system prompt should clearly describe the environment and task, and the toolset should ideally be limited to the bare minimum. For images, an aspect ratio of approximately 1:1 is advised. Potential use cases include lightweight real-time applications, image captioning, text classification, rapid translation, data extraction, simple content generation following precise instructions, and fine-tuning for domain-specific tasks.
| Model Name | Context | Type | GPU | Status | Link |
|---|
There are no public endpoints for this model yet.
Rent your own physically dedicated instance with hourly or long-term monthly billing.
We recommend deploying private instances in the following scenarios:
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
262,144.0 pipeline |
3 | $0.88 | 1.258 | Launch | ||
262,144.0 tensor |
2 | $0.93 | 1.354 | Launch | ||
262,144.0 tensor |
4 | $0.96 | 1.716 | Launch | ||
262,144.0 pipeline |
3 | $1.06 | 1.258 | Launch | ||
262,144.0 tensor |
4 | $1.12 | 1.024 | Launch | ||
262,144.0 tensor |
2 | $1.23 | 1.354 | Launch | ||
262,144.0 tensor |
4 | $1.26 | 1.716 | Launch | ||
262,144.0 tensor |
2 | $1.56 | 1.354 | Launch | ||
262,144.0 tensor |
2 | $1.92 | 1.354 | Launch | ||
262,144.0 |
1 | $2.37 | 2.558 | Launch | ||
262,144.0 tensor |
2 | $2.93 | 1.908 | Launch | ||
262,144.0 |
1 | $3.83 | 2.558 | Launch | ||
262,144.0 |
1 | $4.11 | 3.043 | Launch | ||
262,144.0 |
1 | $4.74 | 4.670 | Launch | ||
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
262,144.0 pipeline |
3 | $0.88 | 1.038 | Launch | ||
262,144.0 tensor |
2 | $0.93 | 1.135 | Launch | ||
262,144.0 tensor |
4 | $0.96 | 1.496 | Launch | ||
262,144.0 pipeline |
3 | $1.06 | 1.038 | Launch | ||
262,144.0 tensor |
2 | $1.23 | 1.135 | Launch | ||
262,144.0 tensor |
4 | $1.26 | 1.496 | Launch | ||
262,144.0 tensor |
2 | $1.56 | 1.135 | Launch | ||
262,144.0 tensor |
2 | $1.92 | 1.135 | Launch | ||
262,144.0 |
1 | $2.37 | 2.338 | Launch | ||
262,144.0 tensor |
2 | $2.93 | 1.688 | Launch | ||
262,144.0 |
1 | $3.83 | 2.338 | Launch | ||
262,144.0 |
1 | $4.11 | 2.823 | Launch | ||
262,144.0 |
1 | $4.74 | 4.450 | Launch | ||
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
262,144.0 tensor |
4 | $0.96 | 1.279 | Launch | ||
262,144.0 tensor |
4 | $1.26 | 1.279 | Launch | ||
262,144.0 pipeline |
3 | $1.34 | 1.652 | Launch | ||
262,144.0 tensor |
4 | $1.57 | 2.387 | Launch | ||
262,144.0 pipeline |
3 | $2.29 | 1.652 | Launch | ||
262,144.0 tensor |
4 | $2.34 | 2.387 | Launch | ||
262,144.0 |
1 | $2.37 | 2.122 | Launch | ||
262,144.0 pipeline |
3 | $2.83 | 1.652 | Launch | ||
262,144.0 tensor |
4 | $2.89 | 2.387 | Launch | ||
262,144.0 tensor |
2 | $2.93 | 1.472 | Launch | ||
262,144.0 tensor |
4 | $3.60 | 2.387 | Launch | ||
262,144.0 |
1 | $3.83 | 2.122 | Launch | ||
262,144.0 |
1 | $4.11 | 2.606 | Launch | ||
262,144.0 |
1 | $4.74 | 4.233 | Launch | ||
Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.