Three small glowing spheres converging, representing a family of small, efficient language models.
Phi is a family of small models tuned so that quality does not have to scale with size.

Microsoft Phi is a family of small language models (SLMs) released as open weights under the MIT license. The models solve a specific problem: most capable large language models are big, slow, and expensive to run, which puts them out of reach for phones, laptops, and cost-sensitive workloads. Phi trades raw scale for carefully curated training data, aiming to keep quality high while the parameter count stays small enough to run on modest hardware.

A small language model is a foundation model with far fewer parameters than a frontier system. Parameters are the learned weights a model uses to generate output. Fewer parameters mean smaller memory footprint, faster inference , and lower cost per request. Microsoft’s bet with Phi is that data quality, not sheer size, drives much of a model’s usefulness. Phi models are trained on heavily filtered and synthetic “textbook-quality” data rather than the whole web.

Where Phi sits

Phi occupies the small end of the model-size spectrum. You reach for it when a frontier model is more than the task needs, or when the deployment target cannot host one.

Frontier models
GPT class Claude Highest capability, hosted, higher cost and latency
Mid-size open models
Llama Mistral Strong general models, still need server-class GPUs
Small language models
Phi-4 Phi-4-mini Phi-4-multimodal Runs on-device or on cheap GPUs, low latency, open weights
Deployment target
Laptop Phone Edge device Small cloud instance

The Phi family

Microsoft has shipped several generations. The current Phi-4 line covers a text model, a compact model, a multimodal model, and reasoning-tuned variants.

  • Phi-4 is a 14 billion parameter text model, first presented in December 2024. It is built on a decoder-only Transformer, was pretrained on roughly 10 trillion tokens of curated and synthetic data, and supports a 16k-token context length. Microsoft targeted mathematics and multi-step reasoning with this release.
  • Phi-4-mini is a 3.8 billion parameter model aimed at even lighter deployment.
  • Phi-4-multimodal is a 5.6 billion parameter model that handles speech, vision, and text in one model using a mixture-of-LoRAs design, with a 128k-token context length. Microsoft reports it ranked first on the Hugging Face OpenASR leaderboard with a 6.14% word error rate at the time of release.
  • Phi-4-reasoning (14B) and Phi-4-reasoning-plus (14B) are reasoning-tuned variants. Phi-4-reasoning-plus is further trained with reinforcement learning to spend more inference-time compute. Phi-4-reasoning-plus supports a 32k-token context by default.
  • Phi-4-mini-reasoning (3.8B) targets multi-step mathematical problem solving at small size.

Earlier generations remain available too. The Phi-3.5 line, released in August 2024, includes Phi-3.5-mini (3.82B), Phi-3.5-vision (4.15B), and Phi-3.5-MoE, a mixture-of-experts model with 41.9 billion total parameters that activates about 6.6 billion per token. All three support a 128k-token context.

How to access it

Phi models are open weights. You do not need a Microsoft account to download and run them.

Step 1 Pick a variant Match model size to hardware and task. Use mini for edge, Phi-4 for general text, multimodal for speech and vision.
Step 2 Get the weights Download from Hugging Face under the MIT license, or select the model in Azure AI Foundry.
Step 3 Run or host Run locally with common inference runtimes, or serve it as a managed endpoint through Azure.

The MIT license allows free use, modification, and distribution, including for commercial products. Phi-4 and the reasoning variants are published on Hugging Face and in the Azure AI Foundry catalog. If you already run other Microsoft-hosted models through Azure OpenAI Service , Foundry gives you Phi alongside them without changing clouds.

How it compares

Phi competes with other small and open model families. The comparison below is about positioning, not a benchmark ranking.

Phi-4Mistral small modelsDeepSeek distills
MakerMicrosoftMistral AIDeepSeek
Size focus3.8B to 14BSmall to midDistilled small variants
LicenseMIT (open weights)Open weights on many modelsOpen weights on many models
StrengthReasoning at small sizeGeneral European multilingualDistilled reasoning
Best forOn-device, cost-sensitive appsBroad general useReasoning on a budget

For the mid-size and multilingual end, see Mistral AI . For distilled reasoning models released as open weights, see DeepSeek .

When not to use it

Small models trade capability for size. Phi is the wrong choice when:

  • You need frontier-level breadth. For the hardest open-ended reasoning, broad world knowledge, or long complex documents, a large model still leads. Phi-4’s base text context is 16k tokens, smaller than many hosted frontier models.
  • You need the widest tool and ecosystem support. Frontier hosted APIs ship mature tool-calling, function-calling, and safety tooling. Verify Phi’s support for your exact features before committing.
  • Accuracy on rare edge cases is safety-critical. A smaller parameter count means less capacity to memorise long-tail facts. Add retrieval or human review for high-stakes output.
  • You have no capacity to self-host and want a fully managed frontier experience. In that case a hosted API may be less operational work, even at higher cost per call.

Match the model to the job. Phi shines when latency, cost, or on-device privacy matter more than absolute peak capability.

Further reading

Sources