Ministral-3-3B-Instruct-2512

multimodal

Ministral-3-3B-Instruct-2512 is the lightest multimodal model in the Ministral 3 lineup, designed specifically for operation on devices with minimal computational resources. Its architecture consists of a text LLM with 3.4 billion parameters and a visual encoder with 0.4 billion parameters. Despite its compact size, the model supports a context window of 256,000 tokens and more than 10 languages.

The model's efficiency stems from the Cascade Distillation method: knowledge from the parent model Mistral Small 3.1 (24B) is transferred through iterative pruning and distillation. Even with a sevenfold reduction in parameters, the model retains a significant portion of its teacher's capabilities. The visual encoder ViT (410M) is frozen during training, while multimodal understanding is achieved via a trainable adapter—minimizing computational costs while preserving image recognition quality. In benchmarks, the model demonstrates competitive results for its class. On Arena Hard (instruction following), it achieves a score of 0.305, and on WildBench (dialog skills), it reaches 56.8. The MATH Maj@1 benchmark yields 0.830, performance comparable to larger models.

Developers recommend using a temperature of 0.1 for most scenarios that do not require creativity. The system prompt should clearly describe the environment and task, and the toolset should ideally be limited to the bare minimum. For images, an aspect ratio of approximately 1:1 is advised. Potential use cases include lightweight real-time applications, image captioning, text classification, rapid translation, data extraction, simple content generation following precise instructions, and fine-tuning for domain-specific tasks.


Announce Date: 31.10.2025
Parameters: 5B
Context: 263K
Layers: 26
Attention Type: Full Attention
Developer: Mistral AI
Transformers Version: 5.0.0.dev0
License: Apache 2.0

Public endpoint

Use our pre-built public endpoints for free to test inference and explore Ministral-3-3B-Instruct-2512 capabilities. You can obtain an API access token on the token management page after registration and verification.
Model Name Context Type GPU Status Link
There are no public endpoints for this model yet.

Private server

Rent your own physically dedicated instance with hourly or long-term monthly billing.

We recommend deploying private instances in the following scenarios:

  • maximize endpoint performance,
  • enable full context for long sequences,
  • ensure top-tier security for data processing in an isolated, dedicated environment,
  • use custom weights, such as fine-tuned models or LoRA adapters.

Recommended server configurations for hosting Ministral-3-3B-Instruct-2512

Prices:
Name GPU Price, hour TPS Max Concurrency
teslat4-3.32.64.160
262,144.0
pipeline
3 $0.88 1.258 Launch
teslaa10-2.16.64.160
262,144.0
tensor
2 $0.93 1.354 Launch
teslat4-4.16.64.160
262,144.0
tensor
4 $0.96 1.716 Launch
teslaa2-3.32.128.160
262,144.0
pipeline
3 $1.06 1.258 Launch
rtx2080ti-4.16.32.160
262,144.0
tensor
4 $1.12 1.024 Launch
rtxa5000-2.16.64.160.nvlink
262,144.0
tensor
2 $1.23 1.354 Launch
teslaa2-4.32.128.160
262,144.0
tensor
4 $1.26 1.716 Launch
rtx3090-2.16.64.160
262,144.0
tensor
2 $1.56 1.354 Launch
rtx4090-2.16.64.160
262,144.0
tensor
2 $1.92 1.354 Launch
teslaa100-1.16.64.160
262,144.0
1 $2.37 2.558 Launch
rtx5090-2.16.64.160
262,144.0
tensor
2 $2.93 1.908 Launch
h100-1.16.64.160
262,144.0
1 $3.83 2.558 Launch
h100nvl-1.16.96.160
262,144.0
1 $4.11 3.043 Launch
h200-1.16.128.160
262,144.0
1 $4.74 4.670 Launch
Prices:
Name GPU Price, hour TPS Max Concurrency
teslat4-3.32.64.160
262,144.0
pipeline
3 $0.88 1.038 Launch
teslaa10-2.16.64.160
262,144.0
tensor
2 $0.93 1.135 Launch
teslat4-4.16.64.160
262,144.0
tensor
4 $0.96 1.496 Launch
teslaa2-3.32.128.160
262,144.0
pipeline
3 $1.06 1.038 Launch
rtxa5000-2.16.64.160.nvlink
262,144.0
tensor
2 $1.23 1.135 Launch
teslaa2-4.32.128.160
262,144.0
tensor
4 $1.26 1.496 Launch
rtx3090-2.16.64.160
262,144.0
tensor
2 $1.56 1.135 Launch
rtx4090-2.16.64.160
262,144.0
tensor
2 $1.92 1.135 Launch
teslaa100-1.16.64.160
262,144.0
1 $2.37 2.338 Launch
rtx5090-2.16.64.160
262,144.0
tensor
2 $2.93 1.688 Launch
h100-1.16.64.160
262,144.0
1 $3.83 2.338 Launch
h100nvl-1.16.96.160
262,144.0
1 $4.11 2.823 Launch
h200-1.16.128.160
262,144.0
1 $4.74 4.450 Launch
Prices:
Name GPU Price, hour TPS Max Concurrency
teslat4-4.16.64.160
262,144.0
tensor
4 $0.96 1.279 Launch
teslaa2-4.32.128.160
262,144.0
tensor
4 $1.26 1.279 Launch
teslaa10-3.16.96.160
262,144.0
pipeline
3 $1.34 1.652 Launch
teslaa10-4.12.48.160
262,144.0
tensor
4 $1.57 2.387 Launch
rtx3090-3.16.96.160
262,144.0
pipeline
3 $2.29 1.652 Launch
rtxa5000-4.16.128.160.nvlink
262,144.0
tensor
4 $2.34 2.387 Launch
teslaa100-1.16.64.160
262,144.0
1 $2.37 2.122 Launch
rtx4090-3.16.96.160
262,144.0
pipeline
3 $2.83 1.652 Launch
rtx3090-4.16.64.160
262,144.0
tensor
4 $2.89 2.387 Launch
rtx5090-2.16.64.160
262,144.0
tensor
2 $2.93 1.472 Launch
rtx4090-4.16.64.160
262,144.0
tensor
4 $3.60 2.387 Launch
h100-1.16.64.160
262,144.0
1 $3.83 2.122 Launch
h100nvl-1.16.96.160
262,144.0
1 $4.11 2.606 Launch
h200-1.16.128.160
262,144.0
1 $4.74 4.233 Launch

Related models

Need help?

Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.