Quick Answer
AI (Artificial Intelligence) is a broad term for software that performs tasks normally requiring human intelligence. Modern AI works by learning statistical patterns from enormous amounts of data, rather than following explicit rules written by a programmer. Large language models like Claude, GPT, and Gemini are the dominant form of AI in 2026.

The old way vs the new way

Traditional software follows explicit rules written by programmers:

python
if age < 18:
    deny_purchase()
if temperature > 100:
    send_alert()

Every situation must be anticipated and coded manually. This works for precise, rule-governed tasks. It breaks down for complex ones, recognising a cat in a photo, understanding sarcasm, or generating a coherent essay.

Machine learning takes a different approach: instead of writing rules, you give the system thousands or millions of labelled examples, and it figures out the patterns itself. Show it enough photos labelled “cat” and “not cat” and it learns what makes a cat a cat, better than you could describe in rules.

The result is systems that can do things programmers could never explicitly code. More: Machine learning, Google Developers

Traditional software
Developer writes explicit rules
Every case must be anticipated in code
Deterministic: same input always gives same output
Easy to explain: "if X then Y"
Breaks on edge cases not in the rules
Good for: accounting, form validation, business rules
Machine learning
System learns rules from labelled examples
Generalises to cases it has never seen
Probabilistic: outputs have confidence scores
Hard to explain: parameters are opaque weights
Degrades gracefully on novel inputs
Good for: image recognition, language, recommendations
A black geometric prism with a precise red laser entering one face and refracting through the structure: raw input transformed into structured, weighted output.
Traditional code is explicit rules written by hand. Machine learning is a system that learns the rules from data. The same input goes in. The transformation is different. The output is calibrated, not declared.

Types of machine learning

Supervised learning, the most common type. You provide labelled training examples (inputs paired with correct outputs). The model learns to predict the output for new inputs. Examples: spam detection, image classification, sentiment analysis, house price prediction.

Unsupervised learning, you provide inputs without labels. The model finds structure on its own. Examples: customer segmentation, anomaly detection, topic modelling.

Reinforcement learning, the model learns by taking actions in an environment and receiving rewards or penalties. Examples: game-playing AI (AlphaGo, AlphaStar), robotic control, recommendation systems. RLHF (Reinforcement Learning from Human Feedback) is used to align language models with human preferences.

Self-supervised learning, the model creates its own labels from the data structure. This is how language models are trained: predict the next word given all previous words. Doing this on billions of documents develops rich internal representations of language and knowledge.

What is a large language model (LLM)?

An LLM (Large Language Model) is the type of AI behind Claude, ChatGPT, Gemini, and similar tools. It is a neural network trained on a vast amount of text, books, websites, code, conversations, scientific papers, using self-supervised learning.

The training objective is simple: given a sequence of tokens, predict the next token. Doing this well at scale (on hundreds of billions of parameters, across trillions of tokens) requires the model to develop deep representations of language, knowledge, and reasoning.

When you send a message to Claude, the model does not “look up” the answer. It generates a response one token at a time, each token being the model’s prediction of what should come next given all the preceding context.

Key LLMs and who builds them:

  • Claude, Anthropic. Model docs
  • GPT-4o, OpenAI
  • Gemini, Google DeepMind
  • Llama 3, Meta (open weights, downloadable)
  • Mistral, Mistral AI (European, open source)
  • Command R, Cohere (enterprise-focused)

Neural networks: the architecture

An LLM is built on a neural network, loosely inspired by the brain, practically a mathematical function. 3Blue1Brown’s video series on neural networks is the clearest visual explanation.

The core components:

  • Parameters (weights): numerical values that the model learns during training. GPT-3 had 175 billion parameters. Parameter counts for frontier models are no longer publicly disclosed; the industry trend since 2023 has moved toward smaller, more efficient models (7B-70B range) for most applications.
  • Layers: stacked transformations applied to the input. Each layer learns to recognise increasingly abstract patterns.
  • Attention mechanism (Transformer): the architectural innovation from 2017 that enabled modern LLMs. Attention lets the model consider relationships between distant parts of the input, understanding that “it” in “The cat sat on the mat. It was comfortable.” refers to the cat. More: Attention is All You Need, original paper (readable abstract)

Training vs inference

Training builds the model. Billions of text examples, gradient descent updating parameters over weeks on thousands of GPUs. This is enormously expensive, frontier model training costs tens to hundreds of millions of dollars.

Inference runs the model. Given your prompt, generate a response. This is much cheaper and is what happens every time you send a message.

Most people and applications use inference via APIs. Only a handful of labs train frontier models from scratch.

Fine-tuning is a middle ground: take a pre-trained model and train it further on a smaller specialised dataset. This adapts a general model for a specific domain (medical, legal, coding) at a fraction of the cost of training from scratch.

AI lifecycle: where you fit in
Pre-training
Trillions of tokens Thousands of GPUs Weeks → months Done by AI labs (Anthropic, OpenAI, Google). Cost: $50M–$500M+
Fine-tuning
Domain-specific data Hours → days Done by companies adapting a base model to their use case
Inference
Your prompt API call Milliseconds What happens every time you use Claude, ChatGPT, or any LLM product

What AI is good at

TaskExamples
Text generationWriting, summarising, translating, explaining, coding
Reasoning and analysisStructuring problems, drafting plans, reviewing documents
Code generationWriting, debugging, refactoring, explaining code
Image understandingDescribing images, answering questions about visual content
Image generationDALL-E, Midjourney, Stable Diffusion, Adobe Firefly
Speech recognitionOpenAI Whisper, Google Speech-to-Text
Structured outputExtracting data from documents, classifying text

What AI is bad at

  • Hallucination, confidently generating false information. Always verify factual claims.
  • Real-time knowledge, models have a training cutoff date and do not know what happened since (without retrieval tools).
  • Precise arithmetic, LLMs are surprisingly poor at multi-step calculation. Use a calculator via tool use.
  • Counting and spatial reasoning, many models count tokens, not discrete objects.
  • Consistent long-form reasoning without tools, complex multi-step reasoning degrades over many steps without scaffolding.
  • Knowing what they don’t know, models often cannot accurately report their own uncertainty.

Context windows and RAG

The context window is how much text a model can consider at once. Everything outside the window is invisible to the model. For long documents, codebases, or multi-session conversations, this matters.

RAG (Retrieval-Augmented Generation) is the standard solution: embed and index your documents, retrieve relevant chunks at query time, and inject them into the context alongside the question. This lets a model answer questions about documents far larger than its context window, and ground answers in up-to-date sources. More: RAG, Anthropic docs

AI in production vs AI as a tool

There is a difference between using an AI tool (ChatGPT, Midjourney) and building something with AI APIs. When you build:

  1. Your code calls an AI API (Anthropic, OpenAI, Google) with a prompt
  2. The model generates a response
  3. Your code uses that response: displays it, extracts data from it, triggers another action based on it

This is how AI-powered products work. The AI is one component inside a larger system, alongside a database, a server, APIs, and user interface. The earlier articles in this series describe all those other components.

A human silhouette facing a vast red-lit industrial system: one person, many components, all integrated into a single production system.
A production AI system is not just an LLM. The model is one component alongside retrieval, databases, business logic, and a user interface. The architect sees the whole. Understanding all the other components in this series is what lets you build and evaluate these systems.

Further reading

What’s next

Next: What is Vibe Coding? , how to use AI to build software without needing to code yourself.