Phi-4-multimodal-instruct

multimodal

Phi-4-multimodal-instruct is an open-source multimodal model from Microsoft that processes text, images, and audio in a unified architectural solution. It builds upon the Phi-3.5 and Phi-4.0 technologies, featuring an extended context window of 128K tokens and support for 23 languages in text (including Russian), 8 languages in audio, and English for visual tasks. The model is optimized for environments with limited computational resources and low-latency scenarios, demonstrating strong performance in mathematics, logic, speech recognition, translation, and image analysis.  

A single neural network handles text, images (OCR, tables, diagrams), and audio (recognition, translation, summarization). For example, in DocVQA benchmarks, the model achieves 93.2% accuracy, outperforming Gemini-2.0-Flash (92.1%).  

The model is ideal for multisensory applications—joint processing of audio and images (e.g., video analysis with subtitles). At the same time, thanks to optimization via Microsoft Olive and ONNX GenAI Runtime, it can be deployed on edge devices, including smartphones and IoT systems, even with limited computational resources.


Announce Date: 27.02.2025
Parameters: 6B
Context: 132K
Layers: 32
Attention Type: Full Attention
Developer: Microsoft
Transformers Version: 4.46.1
License: MIT

Public endpoint

Use our pre-built public endpoints for free to test inference and explore Phi-4-multimodal-instruct capabilities. You can obtain an API access token on the token management page after registration and verification.
Model Name Context Type GPU Status Link
There are no public endpoints for this model yet.

Private server

Rent your own physically dedicated instance with hourly or long-term monthly billing.

We recommend deploying private instances in the following scenarios:

  • maximize endpoint performance,
  • enable full context for long sequences,
  • ensure top-tier security for data processing in an isolated, dedicated environment,
  • use custom weights, such as fine-tuned models or LoRA adapters.

Recommended server configurations for hosting Phi-4-multimodal-instruct

Prices:
Name GPU Price, hour TPS Max Concurrency
teslat4-2.16.32.160
131,072.0
tensor
2 $0.54 1.100 Launch
teslaa2-2.16.32.160
131,072.0
tensor
2 $0.57 1.104 Launch
rtx2080ti-3.12.24.120
131,072.0
pipeline
3 $0.84 1.036 Launch
teslaa10-2.16.64.160
131,072.0
tensor
2 $0.93 2.013 Launch
rtx2080ti-4.16.32.160
131,072.0
tensor
4 $1.12 1.442 Launch
rtxa5000-2.16.64.160.nvlink
131,072.0
tensor
2 $1.23 2.013 Launch
rtx3090-2.16.64.160
131,072.0
tensor
2 $1.56 2.135 Launch
rtx5090-1.16.64.160
131,072.0
1 $1.59 1.434 Launch
rtx3080-4.16.64.160
131,072.0
tensor
4 $1.82 1.223 Launch
rtx4090-2.16.64.160
131,072.0
tensor
2 $1.92 2.130 Launch
teslaa100-1.16.64.160
131,072.0
1 $2.37 4.189 Launch
h100-1.16.64.160
131,072.0
1 $3.83 4.184 Launch
h100nvl-1.16.96.160
131,072.0
1 $4.11 4.984 Launch
teslaa100-2.24.96.160.nvlink
131,072.0
tensor
2 $4.61 8.539 Launch
h200-1.16.128.160
131,072.0
1 $4.74 7.670 Launch
h200-2.24.256.160.nvlink
131,072.0
tensor
2 $9.40 15.502 Launch
Prices:
Name GPU Price, hour TPS Max Concurrency
teslat4-3.32.64.160
131,072.0
pipeline
3 $0.88 1.559 Launch
teslaa10-2.16.64.160
131,072.0
tensor
2 $0.93 1.851 Launch
teslat4-4.16.64.160
131,072.0
tensor
4 $0.96 2.200 Launch
teslaa2-3.32.128.160
131,072.0
pipeline
3 $1.06 1.565 Launch
rtx2080ti-4.16.32.160
131,072.0
tensor
4 $1.12 1.280 Launch
rtxa5000-2.16.64.160.nvlink
131,072.0
tensor
2 $1.23 1.851 Launch
teslaa2-4.32.128.160
131,072.0
tensor
4 $1.26 2.209 Launch
rtx3090-2.16.64.160
131,072.0
tensor
2 $1.56 1.973 Launch
rtx5090-1.16.64.160
131,072.0
1 $1.59 1.272 Launch
rtx3080-4.16.64.160
131,072.0
tensor
4 $1.82 1.061 Launch
rtx4090-2.16.64.160
131,072.0
tensor
2 $1.92 1.968 Launch
teslaa100-1.16.64.160
131,072.0
1 $2.37 4.026 Launch
h100-1.16.64.160
131,072.0
1 $3.83 4.022 Launch
h100nvl-1.16.96.160
131,072.0
1 $4.11 4.822 Launch
teslaa100-2.24.96.160.nvlink
131,072.0
tensor
2 $4.61 8.377 Launch
h200-1.16.128.160
131,072.0
1 $4.74 7.508 Launch
h200-2.24.256.160.nvlink
131,072.0
tensor
2 $9.40 15.340 Launch
Prices:
Name GPU Price, hour TPS Max Concurrency
teslat4-3.32.64.160
131,072.0
pipeline
3 $0.88 1.124 Launch
teslaa10-2.16.64.160
131,072.0
tensor
2 $0.93 1.429 Launch
teslat4-4.16.64.160
131,072.0
tensor
4 $0.96 1.778 Launch
teslaa2-3.32.128.160
131,072.0
pipeline
3 $1.06 1.131 Launch
rtxa5000-2.16.64.160.nvlink
131,072.0
tensor
2 $1.23 1.429 Launch
teslaa2-4.32.128.160
131,072.0
tensor
4 $1.26 1.787 Launch
rtx3090-2.16.64.160
131,072.0
tensor
2 $1.56 1.551 Launch
rtx4090-2.16.64.160
131,072.0
tensor
2 $1.92 1.547 Launch
teslaa100-1.16.64.160
131,072.0
1 $2.37 3.605 Launch
rtx5090-2.16.64.160
131,072.0
tensor
2 $2.93 2.447 Launch
h100-1.16.64.160
131,072.0
1 $3.83 3.601 Launch
h100nvl-1.16.96.160
131,072.0
1 $4.11 4.401 Launch
teslaa100-2.24.96.160.nvlink
131,072.0
tensor
2 $4.61 7.956 Launch
h200-1.16.128.160
131,072.0
1 $4.74 7.087 Launch
h200-2.24.256.160.nvlink
131,072.0
tensor
2 $9.40 14.919 Launch

Related models

Need help?

Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.