Meta Llama
Meta's family of open-weight large language models, downloadable for self-hosting and served across many cloud platforms.

Meta Llama is Meta’s family of open-weight large language models . The weights are published for download, so you can run the model on your own hardware, fine-tune it, and serve it through the platform of your choice. This solves a problem that closed model APIs cannot: full control over where the model runs, what data it sees, and how it is customised, without sending every request to a third-party endpoint. Llama became one of the most widely deployed open-weight model ecosystems since its first release on 24 February 2023.
The current generation, Llama 4, is Meta’s first family built on a mixture-of-experts architecture and its first that is natively multimodal, meaning a single model handles text and images together.
Where Llama sits
Llama is a set of downloadable foundation models . It occupies the model layer of a stack: you supply the serving infrastructure and application code around it.
The Llama 4 family
Meta released two open-weight Llama 4 models and announced a third, larger model that was still in training at the last public update.
- Llama 4 Scout: a 17 billion active parameter model with 16 experts, 109 billion total parameters, and a stated context window of 10 million tokens. Natively multimodal.
- Llama 4 Maverick: a 17 billion active parameter model with 128 experts and 400 billion total parameters. Natively multimodal.
- Llama 4 Behemoth: a 288 billion active parameter model with 16 experts and nearly two trillion total parameters. Announced as still in training and not released as of Meta’s April 2026 update.
Scout and Maverick use a mixture-of-experts design. Only a fraction of the total parameters activate for any given token, which lowers the compute cost of running a large model.
How to access it
There are two main paths, and you can mix them.
Self-hosting. Download Scout or Maverick from llama.com or Hugging Face, then serve the weights on your own GPU servers or rented cloud GPUs. This gives you data residency, offline operation, and the ability to fine-tune freely.
Hosted APIs. Many providers serve Llama behind an API so you skip infrastructure work. That includes cloud model catalogues and dedicated inference vendors. The trade-off is that you no longer control where the model runs.
Both paths are governed by the Llama 4 Community License Agreement and the Llama 4 Acceptable Use Policy. This license permits commercial use but is not an OSI-approved open-source license. It carries conditions, including a threshold clause that has historically required a separate license for the largest deployers, and use restrictions defined by the acceptable use policy. Read the license before shipping to production.
How it compares
Llama competes with other open-weight families and with closed model APIs. The main axis is control versus convenience.
| Meta Llama | Alibaba Qwen | Mistral | Closed API (Claude, Gemini) | |
|---|---|---|---|---|
| Weights | Downloadable | Downloadable | Downloadable | Not released |
| Self-host | Yes | Yes | Yes | No |
| License type | Community license, use limits | Apache 2.0 on many models | Apache 2.0 on open models | Proprietary API only |
| Multimodal | Yes (Llama 4) | Yes (several models) | Yes (several models) | Yes |
| Best for | Control and fine-tuning | Multilingual, permissive terms | European stack, efficiency | No infra, fastest to start |
See Alibaba Qwen , Mistral AI , and DeepSeek for the other major open-weight options, and the LLM landscape 2026 for the full picture including closed providers.
When not to use it
- You want zero infrastructure. If you have no wish to manage GPUs or a serving stack, a closed API removes that burden. Hosted Llama providers narrow the gap but you still pick and manage a vendor.
- The license clashes with your case. The Llama Community License is not a permissive open-source license. If you need Apache 2.0 style freedom, a Qwen or Mistral open model may fit better.
- You need the single strongest general model regardless of openness. Frontier closed models may lead on specific tasks. Benchmark against your own workload before committing.
- Your deployment crosses the license thresholds. Very large-scale deployers face extra conditions. Confirm your obligations with legal counsel first.
Further reading
- What are foundation models? : the model category Llama belongs to.
- What is a large language model? : the core concept behind Llama.
- What is mixture of experts? : the architecture Llama 4 uses.
- Alibaba Qwen : a permissively licensed open-weight alternative.
- Mistral AI : open models from a European provider.
- LLM landscape 2026 : how open and closed models compare.
- Llama official site : downloads and documentation from Meta.
Sources
- Llama official site : Meta’s Llama home, download and license entry point.
- The Llama 4 herd, Meta AI blog : model specifications for Scout, Maverick, and Behemoth, and the mixture-of-experts and multimodal claims.
- Llama (language model), Wikipedia : release history, license name, and documented use restrictions.