Phi-4-multimodal

multimodal

Phi-4-multimodal-instruct is an open-source multimodal model from Microsoft that processes text, images, and audio in a unified architectural solution. It builds upon the Phi-3.5 and Phi-4.0 technologies, featuring an extended context window of 128K tokens and support for 23 languages in text (including Russian), 8 languages in audio, and English for visual tasks. The model is optimized for environments with limited computational resources and low-latency scenarios, demonstrating strong performance in mathematics, logic, speech recognition, translation, and image analysis.  

A single neural network handles text, images (OCR, tables, diagrams), and audio (recognition, translation, summarization). For example, in DocVQA benchmarks, the model achieves 93.2% accuracy, outperforming Gemini-2.0-Flash (92.1%).  

The model is ideal for multisensory applications—joint processing of audio and images (e.g., video analysis with subtitles). At the same time, thanks to optimization via Microsoft Olive and ONNX GenAI Runtime, it can be deployed on edge devices, including smartphones and IoT systems, even with limited computational resources.


Announce Date: 27.02.2025
Parameters: 6B
Context: 132K
Layers: 32
Attention Type: Full Attention
Developer: Microsoft
Transformers Version: 4.46.1
License: MIT

Public endpoint

Use our pre-built public endpoints for free to test inference and explore Phi-4-multimodal capabilities. You can obtain an API access token on the token management page after registration and verification.
Model Name Context Type GPU Status Link
There are no public endpoints for this model yet.

Private server

Rent your own physically dedicated instance with hourly or long-term monthly billing.

We recommend deploying private instances in the following scenarios:

  • maximize endpoint performance,
  • enable full context for long sequences,
  • ensure top-tier security for data processing in an isolated, dedicated environment,
  • use custom weights, such as fine-tuned models or LoRA adapters.

Recommended server configurations for hosting Phi-4-multimodal

Prices:
Name GPU Price, hour TPS Max Concurrency
teslaa10-1.16.32.160
131,072.0
1 $0.53 1.032 Launch
teslat4-2.16.32.160
131,072.0
tensor
2 $0.54 1.325 Launch
teslaa2-2.16.32.160
131,072.0
tensor
2 $0.57 1.325 Launch
rtx3090-1.16.24.160
131,072.0
1 $0.83 1.032 Launch
rtx2080ti-3.12.24.120
131,072.0
tensor
3 $0.84 1.225 Launch
rtx4090-1.16.32.160
131,072.0
1 $1.02 1.032 Launch
teslav100-1.12.64.160
131,072.0
1 $1.20 1.482 Launch
rtxa5000-2.16.64.160.nvlink
131,072.0
tensor
2 $1.23 2.225 Launch
rtx3080-3.16.64.160
131,072.0
tensor
3 $1.43 1.057 Launch
rtx5090-1.16.64.160
131,072.0
1 $1.59 1.482 Launch
teslaa100-1.16.64.160
131,072.0
1 $2.37 4.182 Launch
h100-1.16.64.160
131,072.0
1 $3.83 4.182 Launch
h100nvl-1.16.96.160
131,072.0
1 $4.11 4.969 Launch
h200-1.16.128.160
131,072.0
1 $4.74 7.613 Launch
Prices:
Name GPU Price, hour TPS Max Concurrency
teslat4-2.16.32.160
131,072.0
tensor
2 $0.54 1.163 Launch
teslaa2-2.16.32.160
131,072.0
tensor
2 $0.57 1.163 Launch
rtx2080ti-3.12.24.120
131,072.0
tensor
3 $0.84 1.063 Launch
teslaa10-2.16.64.160
131,072.0
tensor
2 $0.93 2.063 Launch
teslav100-1.12.64.160
131,072.0
1 $1.20 1.320 Launch
rtxa5000-2.16.64.160.nvlink
131,072.0
tensor
2 $1.23 2.063 Launch
rtx3090-2.16.64.160
131,072.0
tensor
2 $1.56 2.063 Launch
rtx5090-1.16.64.160
131,072.0
1 $1.59 1.320 Launch
rtx3080-4.16.64.160
131,072.0
tensor
4 $1.82 1.301 Launch
rtx4090-2.16.64.160
131,072.0
tensor
2 $1.92 2.063 Launch
teslaa100-1.16.64.160
131,072.0
1 $2.37 4.020 Launch
h100-1.16.64.160
131,072.0
1 $3.83 4.020 Launch
h100nvl-1.16.96.160
131,072.0
1 $4.11 4.807 Launch
h200-1.16.128.160
131,072.0
1 $4.74 7.451 Launch
Prices:
Name GPU Price, hour TPS Max Concurrency
teslat4-3.32.64.160
131,072.0
tensor
3 $0.88 1.534 Launch
teslaa10-2.16.64.160
131,072.0
tensor
2 $0.93 1.691 Launch
teslaa2-3.32.128.160
131,072.0
tensor
3 $1.06 1.534 Launch
rtx2080ti-4.16.32.160
131,072.0
tensor
4 $1.12 1.153 Launch
rtxa5000-2.16.64.160.nvlink
131,072.0
tensor
2 $1.23 1.691 Launch
rtx3090-2.16.64.160
131,072.0
tensor
2 $1.56 1.691 Launch
rtx4090-2.16.64.160
131,072.0
tensor
2 $1.92 1.691 Launch
teslav100-2.16.64.240
131,072.0
tensor
2 $2.22 2.591 Launch
teslaa100-1.16.64.160
131,072.0
1 $2.37 3.647 Launch
rtx5090-2.16.64.160
131,072.0
tensor
2 $2.93 2.591 Launch
h100-1.16.64.160
131,072.0
1 $3.83 3.647 Launch
h100nvl-1.16.96.160
131,072.0
1 $4.11 4.434 Launch
h200-1.16.128.160
131,072.0
1 $4.74 7.078 Launch

Related models

Need help?

Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.