Quick Answer
A neural network is a mathematical system of connected nodes organised in layers that learns to map inputs to outputs by adjusting billions of internal numerical weights. It is the core architecture behind all modern AI: image recognition, language models, voice synthesis, and recommendation systems all run on neural networks. The “neural” comes from a loose analogy to neurons in the brain, but modern neural networks are best understood as very large, deeply layered mathematical functions.
White and grey interconnected circular nodes forming a neural network pattern on a dark background: layers of artificial neurons passing signals between them.
Each circle is a node (neuron). Each line is a connection with a learned weight. Signal flows left to right through the layers; learning flows right to left as errors are corrected.

The building block: a single neuron

A single artificial neuron does one simple thing:

  1. Receives multiple numerical inputs (from the previous layer or from raw data)
  2. Multiplies each input by a learned weight (how important this input is)
  3. Adds all the results together
  4. Applies an activation function (to introduce non-linearity)
  5. Outputs a number to the next layer
python
# One artificial neuron
def neuron(inputs, weights, bias):
    weighted_sum = sum(x * w for x, w in zip(inputs, weights)) + bias
    return relu(weighted_sum)  # activation function: return 0 if negative, value if positive

def relu(x):
    return max(0, x)

One neuron is trivial. A network of millions of neurons, organised in many layers, learns to represent extremely complex patterns.

Layers: how networks get their power

Input layer
Raw data (pixels, tokens, numbers) One node per input feature. An image of 224x224 pixels = 150,528 input nodes (three colour channels).
Hidden layers (1 to 100+)
Feature detection Pattern abstraction Hierarchical representation Each layer learns more abstract features than the one before it. "Deep" learning = many hidden layers.
Output layer
Classification (one score per class) Next token probability (language models) Regression value

How learning works: backpropagation

Training a neural network is the process of finding the right values for all the weights, starting from random values.

Forward pass Make a prediction Input data flows through all layers from left to right. The network produces an output: a predicted class, a next word, or a number.
Loss calculation Measure the error Compare the prediction to the correct answer. The loss function quantifies how wrong the prediction was. Lower is better.
Backward pass Assign blame to each weight Calculus tells us which weights contributed most to the error. This gradient information flows right to left through the network.
Weight update Adjust and repeat Each weight is nudged slightly in the direction that reduces the loss. Repeat on millions of examples until the network predicts correctly.

Types of neural networks

TypeInputUsed for
Feedforward (dense)Tabular dataClassification, regression on structured data
Convolutional (CNN)Images, videoImage recognition, object detection, medical imaging
Recurrent (RNN/LSTM)SequencesTime series, older language models (pre-transformer)
TransformerText, images, audioLanguage models (GPT, Claude, Llama), vision models, audio
Diffusion modelNoiseImage generation (Stable Diffusion, DALL-E 3)
Graph neural networkGraphs (molecules, networks)Drug discovery, social network analysis, fraud detection

A concrete example: image classification

Training a neural network to classify handwritten digits (0-9):

python
import torch
import torch.nn as nn

class DigitClassifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.layers = nn.Sequential(
            nn.Flatten(),
            nn.Linear(784, 256),   # 28x28 pixel image → 256 hidden neurons
            nn.ReLU(),
            nn.Linear(256, 128),   # 256 → 128
            nn.ReLU(),
            nn.Linear(128, 10),    # 128 → 10 output classes (digits 0-9)
        )

    def forward(self, x):
        return self.layers(x)

model = DigitClassifier()
# Total parameters: 784*256 + 256 + 256*128 + 128 + 128*10 + 10 = 235,146
# Each parameter is one weight that will be learned during training

After training on 60,000 examples, this small network classifies handwritten digits with 98%+ accuracy. A transformer with billions of parameters applies the same core principle to vastly more complex tasks.

Why this matters for understanding AI

Once you understand neural networks, the capabilities and limitations of modern AI make sense:

  • Why does it get better with more data? More training examples mean more iterations of the weight-adjustment loop, producing better-tuned weights.
  • Why is it hard to explain? There is no single “rule” to read out. Knowledge is distributed across millions of weights.
  • Why does it hallucinate? The network learned to predict plausible output from training patterns. It has no external fact-checking mechanism.
  • Why is training expensive? Millions of forward and backward passes through billions of parameters require thousands of specialised chips (GPUs/TPUs) running for weeks.

What’s next

Further reading