Phi-3.5-mini-instruct

Phi-3.5-mini is the latest model from Microsoft’s Phi series of small language models, combining compactness with high performance. Built on an architecture with 3.8 billion parameters, it can run locally even on modern smartphones, making it one of the most accessible and efficient language models on the market. Thanks to its use of carefully curated and synthetic training data, Phi-3.5-mini delivers results comparable to much larger models such as GPT-3.5 and Mixtral 8x7B, while requiring significantly fewer computational resources.

The uniqueness of Phi-3.5-mini lies in its training approach: instead of simply increasing the model size, developers focused on the quality and relevance of the data. By using carefully filtered web sources and synthetic examples, the model achieves a “data optimal regime”—maximizing the effectiveness of each parameter. This enables Phi-3.5-mini to deliver outstanding performance in reasoning, mathematics, programming, and dialogue tasks, all while remaining compact and fast.

Phi-3.5-mini is particularly well-suited for edge devices, mobile applications, chatbots, educational platforms, and any scenarios where privacy and offline operation are important. The model is ideal for building multilingual assistants, text generation and analysis, solving mathematical and logical problems, and integration into products with limited computational resources.


Announce Date: 23.04.2024
Parameters: 4B
Context: 132K
Layers: 32
Attention Type: Full Attention
Developer: Microsoft
Transformers Version: 4.43.3
License: MIT

Public endpoint

Use our pre-built public endpoints for free to test inference and explore Phi-3.5-mini-instruct capabilities. You can obtain an API access token on the token management page after registration and verification.
Model Name Context Type GPU Status Link
There are no public endpoints for this model yet.

Private server

Rent your own physically dedicated instance with hourly or long-term monthly billing.

We recommend deploying private instances in the following scenarios:

  • maximize endpoint performance,
  • enable full context for long sequences,
  • ensure top-tier security for data processing in an isolated, dedicated environment,
  • use custom weights, such as fine-tuned models or LoRA adapters.

Recommended server configurations for hosting Phi-3.5-mini-instruct

Prices:
Name GPU Price, hour TPS Max Concurrency
teslaa10-1.16.32.160
44,000.0
1 $0.53 1.045 Launch
teslat4-2.16.32.160
44,000.0
tensor
2 $0.54 1.337 Launch
teslaa2-2.16.32.160
44,000.0
tensor
2 $0.57 1.337 Launch
rtx3090-1.16.24.160
44,000.0
1 $0.83 1.045 Launch
rtx2080ti-3.12.24.120
44,000.0
pipeline
3 $0.84 1.237 Launch
rtx4090-1.16.32.160
44,000.0
1 $1.02 1.045 Launch
rtx2080ti-4.16.32.160
44,000.0
tensor
4 $1.12 1.697 Launch
teslav100-1.12.64.160
44,000.0
1 $1.20 1.492 Launch
rtxa5000-2.16.64.160.nvlink
44,000.0
tensor
2 $1.23 2.230 Launch
teslaa10-3.16.96.160
131,072.0
pipeline
3 $1.34 1.147 Launch
rtx3080-3.16.64.160
44,000.0
pipeline
3 $1.43 1.070 Launch
rtx5090-1.16.64.160
44,000.0
1 $1.59 1.492 Launch
teslaa10-4.16.64.160
131,072.0
tensor
4 $1.62 1.545 Launch
teslaa2-6.32.128.160
131,072.0
pipeline
6 $1.65 1.440 Launch
rtx3080-4.16.64.160
44,000.0
tensor
4 $1.82 1.473 Launch
teslav100-2.16.64.240
131,072.0
tensor
2 $2.22 1.049 Launch
rtx3090-3.16.96.160
131,072.0
pipeline
3 $2.29 1.147 Launch
rtxa5000-4.16.128.160.nvlink
131,072.0
tensor
4 $2.34 1.545 Launch
teslaa100-1.16.64.160
131,072.0
1 $2.37 1.401 Launch
rtx4090-3.16.96.160
131,072.0
pipeline
3 $2.83 1.147 Launch
rtx3090-4.16.64.160
131,072.0
tensor
4 $2.89 1.545 Launch
rtx5090-2.16.64.160
131,072.0
tensor
2 $2.93 1.049 Launch
rtx4090-4.16.64.160
131,072.0
tensor
4 $3.60 1.545 Launch
h100-1.16.64.160
131,072.0
1 $3.83 1.401 Launch
h100nvl-1.16.96.160
131,072.0
1 $4.11 1.663 Launch
h200-1.16.128.160
131,072.0
1 $4.74 2.545 Launch
Prices:
Name GPU Price, hour TPS Max Concurrency
teslat4-2.16.32.160
44,000.0
tensor
2 $0.54 1.256 Launch
teslaa2-2.16.32.160
44,000.0
tensor
2 $0.57 1.256 Launch
rtx2080ti-3.12.24.120
44,000.0
pipeline
3 $0.84 1.157 Launch
teslaa10-2.16.64.160
44,000.0
tensor
2 $0.93 2.150 Launch
rtx2080ti-4.16.32.160
44,000.0
tensor
4 $1.12 1.616 Launch
teslav100-1.12.64.160
44,000.0
1 $1.20 1.411 Launch
rtxa5000-2.16.64.160.nvlink
44,000.0
tensor
2 $1.23 2.150 Launch
teslaa10-3.16.96.160
131,072.0
pipeline
3 $1.34 1.120 Launch
rtx3080-3.16.64.160
44,000.0
pipeline
3 $1.43 0.989 Launch
rtx3090-2.16.64.160
44,000.0
tensor
2 $1.56 2.150 Launch
rtx5090-1.16.64.160
44,000.0
1 $1.59 1.411 Launch
teslaa10-4.16.64.160
131,072.0
tensor
4 $1.62 1.518 Launch
teslaa2-6.32.128.160
131,072.0
pipeline
6 $1.65 1.413 Launch
rtx3080-4.16.64.160
44,000.0
tensor
4 $1.82 1.393 Launch
rtx4090-2.16.64.160
44,000.0
tensor
2 $1.92 2.150 Launch
teslav100-2.16.64.240
131,072.0
tensor
2 $2.22 1.022 Launch
rtx3090-3.16.96.160
131,072.0
pipeline
3 $2.29 1.120 Launch
rtxa5000-4.16.128.160.nvlink
131,072.0
tensor
4 $2.34 1.518 Launch
teslaa100-1.16.64.160
131,072.0
1 $2.37 1.374 Launch
rtx4090-3.16.96.160
131,072.0
pipeline
3 $2.83 1.120 Launch
rtx3090-4.16.64.160
131,072.0
tensor
4 $2.89 1.518 Launch
rtx5090-2.16.64.160
131,072.0
tensor
2 $2.93 1.022 Launch
rtx4090-4.16.64.160
131,072.0
tensor
4 $3.60 1.518 Launch
h100-1.16.64.160
131,072.0
1 $3.83 1.374 Launch
h100nvl-1.16.96.160
131,072.0
1 $4.11 1.636 Launch
h200-1.16.128.160
131,072.0
1 $4.74 2.518 Launch
Prices:
Name GPU Price, hour TPS Max Concurrency
teslat4-2.16.32.160
44,000.0
tensor
2 $0.54 1.003 Launch
teslaa2-2.16.32.160
44,000.0
tensor
2 $0.57 1.003 Launch
teslaa10-2.16.64.160
44,000.0
tensor
2 $0.93 1.897 Launch
rtx2080ti-4.16.32.160
44,000.0
tensor
4 $1.12 1.363 Launch
teslav100-1.12.64.160
44,000.0
1 $1.20 1.158 Launch
rtxa5000-2.16.64.160.nvlink
44,000.0
tensor
2 $1.23 1.897 Launch
teslaa10-3.16.96.160
131,072.0
pipeline
3 $1.34 1.035 Launch
rtx3090-2.16.64.160
44,000.0
tensor
2 $1.56 1.897 Launch
rtx5090-1.16.64.160
44,000.0
1 $1.59 1.158 Launch
teslaa10-4.16.64.160
131,072.0
tensor
4 $1.62 1.433 Launch
teslaa2-6.32.128.160
131,072.0
pipeline
6 $1.65 1.328 Launch
rtx3080-4.16.64.160
44,000.0
tensor
4 $1.82 1.139 Launch
rtx4090-2.16.64.160
44,000.0
tensor
2 $1.92 1.897 Launch
rtx3090-3.16.96.160
131,072.0
pipeline
3 $2.29 1.035 Launch
rtxa5000-4.16.128.160.nvlink
131,072.0
tensor
4 $2.34 1.433 Launch
teslaa100-1.16.64.160
131,072.0
1 $2.37 1.289 Launch
rtx4090-3.16.96.160
131,072.0
pipeline
3 $2.83 1.035 Launch
rtx3090-4.16.64.160
131,072.0
tensor
4 $2.89 1.433 Launch
rtx4090-4.16.64.160
131,072.0
tensor
4 $3.60 1.433 Launch
h100-1.16.64.160
131,072.0
1 $3.83 1.289 Launch
teslav100-3.64.256.320
131,072.0
pipeline
3 $3.89 1.485 Launch
h100nvl-1.16.96.160
131,072.0
1 $4.11 1.551 Launch
teslav100-4.32.64.160
131,072.0
tensor
4 $4.28 2.033 Launch
rtx5090-3.16.96.160
131,072.0
pipeline
3 $4.34 1.485 Launch
h200-1.16.128.160
131,072.0
1 $4.74 2.433 Launch
rtx5090-4.16.128.160
131,072.0
tensor
4 $5.74 2.033 Launch

Related models

Need help?

Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.