Quick Answer
A large language model (LLM) is an AI system trained on enormous amounts of text that can read, write, summarise, and reason about language. “Large” refers to billions of internal parameters learned during training. LLMs are the technology behind ChatGPT, Claude, Gemini, Copilot, and most modern AI assistants. They work by predicting the most likely next word given everything before it, repeated until a full response is produced.
Hundreds of books arranged radially around a glowing red central sphere in a dark space: a large language model compresses vast text data into a single model.
An LLM is trained on billions of words from books, websites, and code, then distils all of that into a single model that can engage with any text you give it.

Why “large”?

“Large” refers to the number of parameters in the model: billions of numerical values that encode everything the model has learned. GPT-3 (2020) had 175 billion parameters. Modern frontier models are estimated at hundreds of billions to over a trillion parameters.

Parameters are not directly interpretable (you cannot look at parameter number 4.2 billion and see “this is the concept of democracy”). They are numerical weights that collectively produce language behaviour when combined. The “large” matters because more parameters generally means the model can represent more complex patterns, though this relationship is not linear.

How an LLM generates text

An LLM generates text one token at a time. A token is roughly a word or word fragment:

  • “unbelievable” might become un, believ, able
  • Most common words are single tokens
  • Rare words and most non-English words use multiple tokens

When you send a message, the model processes all the text in the conversation and predicts the most likely next token. Then it appends that token and predicts the next one. This continues until the model generates a stop signal or reaches a length limit.

Prompt: "The capital of Austria is"
Token 1: "Vienna"         ← predicted with ~95% probability
Token 2: "."              ← predicted with high probability
Token 3: [STOP]

The key insight: an LLM does not retrieve facts from a database. It generates text from learned statistical patterns. This is why it can write fluently about any topic, and also why it can be wrong.

The components of a modern LLM system

User interface
Chat interface (ChatGPT, Claude.ai) API (developer access) SDK (Python, Node.js)
System prompt
Persona and role definition Behaviour constraints Context injection Set by the application builder, invisible to the end user in most products
The LLM
Transformer architecture Billions of parameters Context window Pre-trained on large text corpus, then fine-tuned on instructions and human preferences
Optional: tools
Web search Code execution Database retrieval (RAG) Modern LLMs can call external tools to augment their knowledge and capabilities

What LLMs are good at

  • Writing: Drafts, emails, reports, summaries, translations
  • Code: Writing, explaining, debugging, reviewing, and refactoring code
  • Analysis: Extracting structure from unstructured text, classifying content, identifying patterns
  • Question answering: Answering questions about documents you provide (the model reads what you paste in)
  • Reasoning: Multi-step logical problems, comparing options, planning
  • Conversation: Maintaining context across a long dialogue

What LLMs are not good at

  • Real-time information: Training data has a cut-off date. The model does not know what happened yesterday unless you tell it or connect it to a search tool.
  • Precise arithmetic: LLMs are not calculators. They can do basic maths but make errors on multi-step calculations. Use code execution for precise numbers.
  • Remembering between sessions: Each conversation is independent. The model does not remember your name from last week’s chat unless explicitly told.
  • Guaranteed accuracy: LLMs hallucinate. They generate plausible-sounding text that may be factually wrong. Never rely on LLM output for medical, legal, or financial decisions without expert verification.

Major LLMs compared

GPT-4o (OpenAI)Claude Sonnet 4.6 (Anthropic)Gemini 2.0 Flash (Google)Mistral Large
Context window128K tokens200K tokens1M tokens128K tokens
Multimodal (images)YesYesYesNo
Best atGeneral tasks, GPT ecosystemLong documents, codingSpeed, large contextEU residency, cost
API pricing (input/1M)~€4.50~€3.00~€0.10€2.00
Data residencyUSUSUS or EUEU (Paris)
Pre-training Learn language Trained on hundreds of billions of words to predict the next token. Learns grammar, facts, reasoning, and style from the data distribution.
Fine-tuning Learn to follow instructions Further trained on human-written instruction-response pairs. Teaches the model to be helpful rather than just predicting text.
RLHF Learn human preferences Human raters compare responses. A reward model learns preferences. The LLM is tuned to produce responses that score higher with human raters.
Deployment Serve via API The model is quantised and served at scale. Developers access it via API with per-token pricing. Consumers access via chat interfaces.

What’s next

Further reading