What is AI?
AI is software that learns patterns from data instead of following hand-written rules. Here is what that actually means, without the hype.
The old way vs the new way
Traditional software follows explicit rules written by programmers:
if age < 18:
deny_purchase()
if temperature > 100:
send_alert()Every situation must be anticipated and coded manually. This works for precise, rule-governed tasks. It breaks down for complex ones, recognising a cat in a photo, understanding sarcasm, or generating a coherent essay.
Machine learning takes a different approach: instead of writing rules, you give the system thousands or millions of labelled examples, and it figures out the patterns itself. Show it enough photos labelled “cat” and “not cat” and it learns what makes a cat a cat, better than you could describe in rules.
The result is systems that can do things programmers could never explicitly code. More: Machine learning, Google Developers

Types of machine learning
Supervised learning, the most common type. You provide labelled training examples (inputs paired with correct outputs). The model learns to predict the output for new inputs. Examples: spam detection, image classification, sentiment analysis, house price prediction.
Unsupervised learning, you provide inputs without labels. The model finds structure on its own. Examples: customer segmentation, anomaly detection, topic modelling.
Reinforcement learning, the model learns by taking actions in an environment and receiving rewards or penalties. Examples: game-playing AI (AlphaGo, AlphaStar), robotic control, recommendation systems. RLHF (Reinforcement Learning from Human Feedback) is used to align language models with human preferences.
Self-supervised learning, the model creates its own labels from the data structure. This is how language models are trained: predict the next word given all previous words. Doing this on billions of documents develops rich internal representations of language and knowledge.
What is a large language model (LLM)?
An LLM (Large Language Model) is the type of AI behind Claude, ChatGPT, Gemini, and similar tools. It is a neural network trained on a vast amount of text, books, websites, code, conversations, scientific papers, using self-supervised learning.
The training objective is simple: given a sequence of tokens, predict the next token. Doing this well at scale (on hundreds of billions of parameters, across trillions of tokens) requires the model to develop deep representations of language, knowledge, and reasoning.
When you send a message to Claude, the model does not “look up” the answer. It generates a response one token at a time, each token being the model’s prediction of what should come next given all the preceding context.
Key LLMs and who builds them:
- Claude, Anthropic. Model docs
- GPT-4o, OpenAI
- Gemini, Google DeepMind
- Llama 3, Meta (open weights, downloadable)
- Mistral, Mistral AI (European, open source)
- Command R, Cohere (enterprise-focused)
Neural networks: the architecture
An LLM is built on a neural network, loosely inspired by the brain, practically a mathematical function. 3Blue1Brown’s video series on neural networks is the clearest visual explanation.
The core components:
- Parameters (weights): numerical values that the model learns during training. GPT-3 had 175 billion parameters. Parameter counts for frontier models are no longer publicly disclosed; the industry trend since 2023 has moved toward smaller, more efficient models (7B-70B range) for most applications.
- Layers: stacked transformations applied to the input. Each layer learns to recognise increasingly abstract patterns.
- Attention mechanism (Transformer): the architectural innovation from 2017 that enabled modern LLMs. Attention lets the model consider relationships between distant parts of the input, understanding that “it” in “The cat sat on the mat. It was comfortable.” refers to the cat. More: Attention is All You Need, original paper (readable abstract)
Training vs inference
Training builds the model. Billions of text examples, gradient descent updating parameters over weeks on thousands of GPUs. This is enormously expensive, frontier model training costs tens to hundreds of millions of dollars.
Inference runs the model. Given your prompt, generate a response. This is much cheaper and is what happens every time you send a message.
Most people and applications use inference via APIs. Only a handful of labs train frontier models from scratch.
Fine-tuning is a middle ground: take a pre-trained model and train it further on a smaller specialised dataset. This adapts a general model for a specific domain (medical, legal, coding) at a fraction of the cost of training from scratch.
What AI is good at
| Task | Examples |
|---|---|
| Text generation | Writing, summarising, translating, explaining, coding |
| Reasoning and analysis | Structuring problems, drafting plans, reviewing documents |
| Code generation | Writing, debugging, refactoring, explaining code |
| Image understanding | Describing images, answering questions about visual content |
| Image generation | DALL-E, Midjourney, Stable Diffusion, Adobe Firefly |
| Speech recognition | OpenAI Whisper, Google Speech-to-Text |
| Structured output | Extracting data from documents, classifying text |
What AI is bad at
- Hallucination, confidently generating false information. Always verify factual claims.
- Real-time knowledge, models have a training cutoff date and do not know what happened since (without retrieval tools).
- Precise arithmetic, LLMs are surprisingly poor at multi-step calculation. Use a calculator via tool use.
- Counting and spatial reasoning, many models count tokens, not discrete objects.
- Consistent long-form reasoning without tools, complex multi-step reasoning degrades over many steps without scaffolding.
- Knowing what they don’t know, models often cannot accurately report their own uncertainty.
Context windows and RAG
The context window is how much text a model can consider at once. Everything outside the window is invisible to the model. For long documents, codebases, or multi-session conversations, this matters.
RAG (Retrieval-Augmented Generation) is the standard solution: embed and index your documents, retrieve relevant chunks at query time, and inject them into the context alongside the question. This lets a model answer questions about documents far larger than its context window, and ground answers in up-to-date sources. More: RAG, Anthropic docs
AI in production vs AI as a tool
There is a difference between using an AI tool (ChatGPT, Midjourney) and building something with AI APIs. When you build:
- Your code calls an AI API (Anthropic, OpenAI, Google) with a prompt
- The model generates a response
- Your code uses that response: displays it, extracts data from it, triggers another action based on it
This is how AI-powered products work. The AI is one component inside a larger system, alongside a database, a server, APIs, and user interface. The earlier articles in this series describe all those other components.

Further reading
- Anthropic documentation , Claude API reference, prompt engineering guide, model capabilities
- But what is a neural network?, 3Blue1Brown , 4-part visual series, the best introduction to how neural networks learn
- Google Machine Learning Crash Course , free, practical, interactive
- fast.ai, Practical Deep Learning , free course, code-first, runs real models from the start
- Andrej Karpathy’s Neural Networks: Zero to Hero , builds GPT from scratch, exceptional depth
- The Illustrated Transformer, Jay Alammar , visual explanation of the transformer architecture
- AI Safety Fundamentals , free course on AI alignment and safety
What’s next
Next: What is Vibe Coding? , how to use AI to build software without needing to code yourself.
Frequently asked questions
