Microsoft Phi
Microsoft Phi is a family of small, open-weight language models built to stay capable at sizes that run on-device and cut inference cost.

Microsoft Phi is a family of small language models (SLMs) released as open weights under the MIT license. The models solve a specific problem: most capable large language models are big, slow, and expensive to run, which puts them out of reach for phones, laptops, and cost-sensitive workloads. Phi trades raw scale for carefully curated training data, aiming to keep quality high while the parameter count stays small enough to run on modest hardware.
A small language model is a foundation model with far fewer parameters than a frontier system. Parameters are the learned weights a model uses to generate output. Fewer parameters mean smaller memory footprint, faster inference , and lower cost per request. Microsoft’s bet with Phi is that data quality, not sheer size, drives much of a model’s usefulness. Phi models are trained on heavily filtered and synthetic “textbook-quality” data rather than the whole web.
Where Phi sits
Phi occupies the small end of the model-size spectrum. You reach for it when a frontier model is more than the task needs, or when the deployment target cannot host one.
The Phi family
Microsoft has shipped several generations. The current Phi-4 line covers a text model, a compact model, a multimodal model, and reasoning-tuned variants.
- Phi-4 is a 14 billion parameter text model, first presented in December 2024. It is built on a decoder-only Transformer, was pretrained on roughly 10 trillion tokens of curated and synthetic data, and supports a 16k-token context length. Microsoft targeted mathematics and multi-step reasoning with this release.
- Phi-4-mini is a 3.8 billion parameter model aimed at even lighter deployment.
- Phi-4-multimodal is a 5.6 billion parameter model that handles speech, vision, and text in one model using a mixture-of-LoRAs design, with a 128k-token context length. Microsoft reports it ranked first on the Hugging Face OpenASR leaderboard with a 6.14% word error rate at the time of release.
- Phi-4-reasoning (14B) and Phi-4-reasoning-plus (14B) are reasoning-tuned variants. Phi-4-reasoning-plus is further trained with reinforcement learning to spend more inference-time compute. Phi-4-reasoning-plus supports a 32k-token context by default.
- Phi-4-mini-reasoning (3.8B) targets multi-step mathematical problem solving at small size.
Earlier generations remain available too. The Phi-3.5 line, released in August 2024, includes Phi-3.5-mini (3.82B), Phi-3.5-vision (4.15B), and Phi-3.5-MoE, a mixture-of-experts model with 41.9 billion total parameters that activates about 6.6 billion per token. All three support a 128k-token context.
How to access it
Phi models are open weights. You do not need a Microsoft account to download and run them.
The MIT license allows free use, modification, and distribution, including for commercial products. Phi-4 and the reasoning variants are published on Hugging Face and in the Azure AI Foundry catalog. If you already run other Microsoft-hosted models through Azure OpenAI Service , Foundry gives you Phi alongside them without changing clouds.
How it compares
Phi competes with other small and open model families. The comparison below is about positioning, not a benchmark ranking.
| Phi-4 | Mistral small models | DeepSeek distills | |
|---|---|---|---|
| Maker | Microsoft | Mistral AI | DeepSeek |
| Size focus | 3.8B to 14B | Small to mid | Distilled small variants |
| License | MIT (open weights) | Open weights on many models | Open weights on many models |
| Strength | Reasoning at small size | General European multilingual | Distilled reasoning |
| Best for | On-device, cost-sensitive apps | Broad general use | Reasoning on a budget |
For the mid-size and multilingual end, see Mistral AI . For distilled reasoning models released as open weights, see DeepSeek .
When not to use it
Small models trade capability for size. Phi is the wrong choice when:
- You need frontier-level breadth. For the hardest open-ended reasoning, broad world knowledge, or long complex documents, a large model still leads. Phi-4’s base text context is 16k tokens, smaller than many hosted frontier models.
- You need the widest tool and ecosystem support. Frontier hosted APIs ship mature tool-calling, function-calling, and safety tooling. Verify Phi’s support for your exact features before committing.
- Accuracy on rare edge cases is safety-critical. A smaller parameter count means less capacity to memorise long-tail facts. Add retrieval or human review for high-stakes output.
- You have no capacity to self-host and want a fully managed frontier experience. In that case a hosted API may be less operational work, even at higher cost per call.
Match the model to the job. Phi shines when latency, cost, or on-device privacy matter more than absolute peak capability.
Further reading
- What is a large language model? : how model size and parameters shape capability and cost.
- Foundation models : the broad category Phi belongs to.
- Inference : why running a model is where the cost and latency of small models pays off.
- Mixture of experts : the architecture behind Phi-3.5-MoE.
- Azure OpenAI Service : Microsoft’s hosted model platform, where Phi is also available.
- Phi open models on Azure : Microsoft’s official product page for the family.
Sources
- Phi Open Models, Microsoft Azure : official product page for the Phi family.
- Empowering innovation: the next generation of the Phi family, Microsoft Azure Blog : Phi-4-mini and Phi-4-multimodal announcement.
- Microsoft launches Phi-4-reasoning-plus, VentureBeat : reasoning variant sizes and context length.
- Microsoft AI released Phi-4 under the MIT license, MarkTechPost : open weights and MIT licensing.
- Microsoft AI releases Phi-3.5 mini, MoE and Vision, MarkTechPost : Phi-3.5 family sizes and context.
- microsoft/phi-4, Hugging Face : model card for the 14B text model.