Tool

Added 29 Jun 2026 Last updated 29 Jun 2026 Read time 5 min

Meta Llama

Meta's family of open-weight large language models, downloadable for self-hosting and served across many cloud platforms.

open-weightllmfoundation-modelsmetaself-hosting

Connected Foundation Models LLM - Large Language Model Mixture of Experts (MoE)Alibaba Qwen Mistral AI

Learn this your way

Read Guided course

A multi-layer server block with red strips, representing a widely deployed open-weight model family. — Llama weights are downloadable, so the model runs on your own servers rather than only behind a vendor API.

Meta Llama is Meta’s family of open-weight large language models . The weights are published for download, so you can run the model on your own hardware, fine-tune it, and serve it through the platform of your choice. This solves a problem that closed model APIs cannot: full control over where the model runs, what data it sees, and how it is customised, without sending every request to a third-party endpoint. Llama became one of the most widely deployed open-weight model ecosystems since its first release on 24 February 2023.

The current generation, Llama 4, is Meta’s first family built on a mixture-of-experts architecture and its first that is natively multimodal, meaning a single model handles text and images together.

Where Llama sits

Llama is a set of downloadable foundation models . It occupies the model layer of a stack: you supply the serving infrastructure and application code around it.

Application

Chat and agents RAG pipelines Your product logic and prompts

Serving

vLLM Ollama Hosted inference APIs Self-hosted or via a provider

Model weights

Llama 4 Scout Llama 4 Maverick Downloadable, open-weight

Hardware

GPU servers Cloud GPU rental You own or rent the compute

The Llama 4 family

Meta released two open-weight Llama 4 models and announced a third, larger model that was still in training at the last public update.

Llama 4 Scout: a 17 billion active parameter model with 16 experts, 109 billion total parameters, and a stated context window of 10 million tokens. Natively multimodal.
Llama 4 Maverick: a 17 billion active parameter model with 128 experts and 400 billion total parameters. Natively multimodal.
Llama 4 Behemoth: a 288 billion active parameter model with 16 experts and nearly two trillion total parameters. Announced as still in training and not released as of Meta’s April 2026 update.

Scout and Maverick use a mixture-of-experts design. Only a fraction of the total parameters activate for any given token, which lowers the compute cost of running a large model.

How to access it

There are two main paths, and you can mix them.

Step 1 Get the weights Download from llama.com or Hugging Face after accepting the license.

→

Step 2 Choose serving Self-host with vLLM or Ollama, or use a hosted inference provider.

→

Step 3 Customise Fine-tune on your data or wire the model into a RAG or agent pipeline.

Self-hosting. Download Scout or Maverick from llama.com or Hugging Face, then serve the weights on your own GPU servers or rented cloud GPUs. This gives you data residency, offline operation, and the ability to fine-tune freely.

Hosted APIs. Many providers serve Llama behind an API so you skip infrastructure work. That includes cloud model catalogues and dedicated inference vendors. The trade-off is that you no longer control where the model runs.

Both paths are governed by the Llama 4 Community License Agreement and the Llama 4 Acceptable Use Policy. This license permits commercial use but is not an OSI-approved open-source license. It carries conditions, including a threshold clause that has historically required a separate license for the largest deployers, and use restrictions defined by the acceptable use policy. Read the license before shipping to production.

How it compares

Llama competes with other open-weight families and with closed model APIs. The main axis is control versus convenience.

	Meta Llama	Alibaba Qwen	Mistral	Closed API (Claude, Gemini)
Weights	Downloadable	Downloadable	Downloadable	Not released
Self-host	Yes	Yes	Yes	No
License type	Community license, use limits	Apache 2.0 on many models	Apache 2.0 on open models	Proprietary API only
Multimodal	Yes (Llama 4)	Yes (several models)	Yes (several models)	Yes
Best for	Control and fine-tuning	Multilingual, permissive terms	European stack, efficiency	No infra, fastest to start

See Alibaba Qwen , Mistral AI , and DeepSeek for the other major open-weight options, and the LLM landscape 2026 for the full picture including closed providers.

When not to use it

You want zero infrastructure. If you have no wish to manage GPUs or a serving stack, a closed API removes that burden. Hosted Llama providers narrow the gap but you still pick and manage a vendor.
The license clashes with your case. The Llama Community License is not a permissive open-source license. If you need Apache 2.0 style freedom, a Qwen or Mistral open model may fit better.
You need the single strongest general model regardless of openness. Frontier closed models may lead on specific tasks. Benchmark against your own workload before committing.
Your deployment crosses the license thresholds. Very large-scale deployers face extra conditions. Confirm your obligations with legal counsel first.

Sources

Llama official site : Meta’s Llama home, download and license entry point.
The Llama 4 herd, Meta AI blog : model specifications for Scout, Maverick, and Behemoth, and the mixture-of-experts and multimodal claims.
Llama (language model), Wikipedia : release history, license name, and documented use restrictions.

Open source projects

Freelancer Templates Contracts, proposals, SOWs

Freelancer Automation Workflow recipes, AI playbooks

Work with Linda

Workshop Series €2,000/mo x 3

1:1 Consulting 60 min session