Ministral-3-3B-Instruct-2512

multimodal

Ministral-3-3B-Instruct-2512 is the lightest multimodal model in the Ministral 3 lineup, designed specifically for operation on devices with minimal computational resources. Its architecture consists of a text LLM with 3.4 billion parameters and a visual encoder with 0.4 billion parameters. Despite its compact size, the model supports a context window of 256,000 tokens and more than 10 languages.

The model's efficiency stems from the Cascade Distillation method: knowledge from the parent model Mistral Small 3.1 (24B) is transferred through iterative pruning and distillation. Even with a sevenfold reduction in parameters, the model retains a significant portion of its teacher's capabilities. The visual encoder ViT (410M) is frozen during training, while multimodal understanding is achieved via a trainable adapter—minimizing computational costs while preserving image recognition quality. In benchmarks, the model demonstrates competitive results for its class. On Arena Hard (instruction following), it achieves a score of 0.305, and on WildBench (dialog skills), it reaches 56.8. The MATH Maj@1 benchmark yields 0.830, performance comparable to larger models.

Developers recommend using a temperature of 0.1 for most scenarios that do not require creativity. The system prompt should clearly describe the environment and task, and the toolset should ideally be limited to the bare minimum. For images, an aspect ratio of approximately 1:1 is advised. Potential use cases include lightweight real-time applications, image captioning, text classification, rapid translation, data extraction, simple content generation following precise instructions, and fine-tuning for domain-specific tasks.


Announce Date: 31.10.2025
Parameters: 5B
Context: 263K
Layers: 26
Attention Type: Full Attention
Developer: Mistral AI
Transformers Version: 5.0.0.dev0
License: Apache 2.0

Public endpoint

Use our pre-built public endpoints for free to test inference and explore Ministral-3-3B-Instruct-2512 capabilities. You can obtain an API access token on the token management page after registration and verification.
Model Name Context Type GPU Status Link
There are no public endpoints for this model yet.

Private server

Rent your own physically dedicated instance with hourly or long-term monthly billing.

We recommend deploying private instances in the following scenarios:

  • maximize endpoint performance,
  • enable full context for long sequences,
  • ensure top-tier security for data processing in an isolated, dedicated environment,
  • use custom weights, such as fine-tuned models or LoRA adapters.

Recommended server configurations for hosting Ministral-3-3B-Instruct-2512

Prices:
Name GPU Price, hour TPS Max Concurrency
teslat4-3.32.64.160
262,144.0
pipeline
3 $0.88 1.046 Launch
teslaa10-2.16.64.160
262,144.0
tensor
2 $0.93 1.224 Launch
teslat4-4.16.64.160
262,144.0
tensor
4 $0.96 1.438 Launch
teslaa2-3.32.128.160
262,144.0
pipeline
3 $1.06 1.050 Launch
rtxa5000-2.16.64.160.nvlink
262,144.0
tensor
2 $1.23 1.224 Launch
teslaa2-4.32.128.160
262,144.0
tensor
4 $1.26 1.444 Launch
rtx3090-2.16.64.160
262,144.0
tensor
2 $1.56 1.299 Launch
rtx4090-2.16.64.160
262,144.0
tensor
2 $1.92 1.296 Launch
teslaa100-1.16.64.160
262,144.0
1 $2.37 2.562 Launch
rtx5090-2.16.64.160
262,144.0
tensor
2 $2.93 1.850 Launch
h100-1.16.64.160
262,144.0
1 $3.83 2.560 Launch
h100nvl-1.16.96.160
262,144.0
1 $4.11 3.052 Launch
teslaa100-2.24.96.160.nvlink
262,144.0
tensor
2 $4.61 5.240 Launch
h200-1.16.128.160
262,144.0
1 $4.74 4.705 Launch
h200-2.24.256.160.nvlink
262,144.0
tensor
2 $9.40 9.525 Launch
Prices:
Name GPU Price, hour TPS Max Concurrency
teslaa10-2.16.64.160
262,144.0
tensor
2 $0.93 1.004 Launch
teslat4-4.16.64.160
262,144.0
tensor
4 $0.96 1.218 Launch
rtxa5000-2.16.64.160.nvlink
262,144.0
tensor
2 $1.23 1.004 Launch
teslaa2-4.32.128.160
262,144.0
tensor
4 $1.26 1.224 Launch
rtx3090-2.16.64.160
262,144.0
tensor
2 $1.56 1.079 Launch
rtx4090-2.16.64.160
262,144.0
tensor
2 $1.92 1.076 Launch
teslaa100-1.16.64.160
262,144.0
1 $2.37 2.343 Launch
rtx5090-2.16.64.160
262,144.0
tensor
2 $2.93 1.630 Launch
h100-1.16.64.160
262,144.0
1 $3.83 299.560 2.340 Launch
h100nvl-1.16.96.160
262,144.0
1 $4.11 2.832 Launch
teslaa100-2.24.96.160.nvlink
262,144.0
tensor
2 $4.61 5.020 Launch
h200-1.16.128.160
262,144.0
1 $4.74 4.485 Launch
h200-2.24.256.160.nvlink
262,144.0
tensor
2 $9.40 9.305 Launch
Prices:
Name GPU Price, hour TPS Max Concurrency
teslat4-4.16.64.160
262,144.0
tensor
4 $0.96 1.002 Launch
teslaa2-4.32.128.160
262,144.0
tensor
4 $1.26 1.007 Launch
teslaa10-3.16.96.160
262,144.0
pipeline
3 $1.34 1.435 Launch
teslaa10-4.12.48.160
262,144.0
tensor
4 $1.57 2.125 Launch
rtx3090-3.16.96.160
262,144.0
pipeline
3 $2.29 1.548 Launch
rtxa5000-4.16.128.160.nvlink
262,144.0
tensor
4 $2.34 2.125 Launch
teslaa100-1.16.64.160
262,144.0
1 $2.37 2.126 Launch
rtx4090-3.16.96.160
262,144.0
pipeline
3 $2.83 1.543 Launch
rtx3090-4.16.64.160
262,144.0
tensor
4 $2.89 2.275 Launch
rtx5090-2.16.64.160
262,144.0
tensor
2 $2.93 1.413 Launch
rtx4090-4.16.64.160
262,144.0
tensor
4 $3.60 2.270 Launch
h100-1.16.64.160
262,144.0
1 $3.83 2.123 Launch
h100nvl-1.16.96.160
262,144.0
1 $4.11 2.616 Launch
teslaa100-2.24.96.160.nvlink
262,144.0
tensor
2 $4.61 4.803 Launch
h200-1.16.128.160
262,144.0
1 $4.74 4.268 Launch
h200-2.24.256.160.nvlink
262,144.0
tensor
2 $9.40 9.088 Launch

Related models

Need help?

Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.