What is a Neural Network?

Q: "Why is it called a neural network?"

"The name comes from biological neurons in the brain. Biological neurons receive signals from other neurons, and if the combined signal is strong enough, they fire and pass a signal forward. Artificial neural networks loosely mimic this: each node (artificial neuron) receives numeric inputs, multiplies them by learned weights, adds them together, and produces an output. The similarity is at the conceptual level; artificial neural networks do not actually work like the brain in any biologically accurate sense."

Q: "What is the difference between a neural network and deep learning?"

"Deep learning is neural networks with many layers: typically more than two hidden layers. Early neural networks had one or two layers and struggled with complex tasks. Deep networks with many layers can learn hierarchical features: low layers detect edges, middle layers detect shapes, high layers detect objects. 'Deep' specifically refers to the depth (number of layers). ChatGPT runs on a very deep neural network with billions of parameters."

Q: "How does a neural network learn?"

"Through backpropagation. The network makes a prediction, compares it to the correct answer, calculates the error, and propagates that error backwards through the layers to adjust the weights. This adjustment is called a gradient descent step. Repeat this millions of times on millions of examples, and the weights gradually converge to values that produce correct predictions. The learning is entirely in the weight adjustments."

Q: "What is a transformer and how does it relate to neural networks?"

"A transformer is a specific neural network architecture introduced in 2017 that underlies all modern large language models. It uses a mechanism called self-attention to process all parts of a sequence simultaneously rather than step by step. GPT-4, Claude, Llama, and Gemini are all transformer neural networks. The transformer is the dominant architecture for language, image (Vision Transformer), and audio tasks."

Q: "How many neurons does a modern neural network have?"

"GPT-3 has 175 billion parameters (weights). Each parameter is a connection weight in the network. A network with 175 billion parameters has a rough equivalent of hundreds of billions of 'connections', vastly exceeding the 100-500 trillion synapses in the human brain in sheer number but utterly different in architecture and function."

A neural network is the core architecture behind modern AI. Plain-English explanation of how layers, weights, and backpropagation work, with no maths required.

4 min read No prior knowledge needed

Recommended watch

But what is a neural network? | Deep learning chapter 1

3Blue1Brown

Watch on YouTube →

Quick Answer

A neural network is a mathematical system of connected nodes organised in layers that learns to map inputs to outputs by adjusting billions of internal numerical weights. It is the core architecture behind all modern AI: image recognition, language models, voice synthesis, and recommendation systems all run on neural networks. The “neural” comes from a loose analogy to neurons in the brain, but modern neural networks are best understood as very large, deeply layered mathematical functions.

White and grey interconnected circular nodes forming a neural network pattern on a dark background: layers of artificial neurons passing signals between them. — Each circle is a node (neuron). Each line is a connection with a learned weight. Signal flows left to right through the layers; learning flows right to left as errors are corrected.

The building block: a single neuron

A single artificial neuron does one simple thing:

Receives multiple numerical inputs (from the previous layer or from raw data)
Multiplies each input by a learned weight (how important this input is)
Adds all the results together
Applies an activation function (to introduce non-linearity)
Outputs a number to the next layer

python

# One artificial neuron
def neuron(inputs, weights, bias):
    weighted_sum = sum(x * w for x, w in zip(inputs, weights)) + bias
    return relu(weighted_sum)  # activation function: return 0 if negative, value if positive

def relu(x):
    return max(0, x)

One neuron is trivial. A network of millions of neurons, organised in many layers, learns to represent extremely complex patterns.

Layers: how networks get their power

Input layer

Raw data (pixels, tokens, numbers) One node per input feature. An image of 224x224 pixels = 150,528 input nodes (three colour channels).

Hidden layers (1 to 100+)

Feature detection Pattern abstraction Hierarchical representation Each layer learns more abstract features than the one before it. "Deep" learning = many hidden layers.

Output layer

Classification (one score per class) Next token probability (language models) Regression value

How learning works: backpropagation

Training a neural network is the process of finding the right values for all the weights, starting from random values.

Forward pass Make a prediction Input data flows through all layers from left to right. The network produces an output: a predicted class, a next word, or a number.

→

Loss calculation Measure the error Compare the prediction to the correct answer. The loss function quantifies how wrong the prediction was. Lower is better.

→

Backward pass Assign blame to each weight Calculus tells us which weights contributed most to the error. This gradient information flows right to left through the network.

→

Weight update Adjust and repeat Each weight is nudged slightly in the direction that reduces the loss. Repeat on millions of examples until the network predicts correctly.

Types of neural networks

Type	Input	Used for
Feedforward (dense)	Tabular data	Classification, regression on structured data
Convolutional (CNN)	Images, video	Image recognition, object detection, medical imaging
Recurrent (RNN/LSTM)	Sequences	Time series, older language models (pre-transformer)
Transformer	Text, images, audio	Language models (GPT, Claude, Llama), vision models, audio
Diffusion model	Noise	Image generation (Stable Diffusion, DALL-E 3)
Graph neural network	Graphs (molecules, networks)	Drug discovery, social network analysis, fraud detection

A concrete example: image classification

Training a neural network to classify handwritten digits (0-9):

python

import torch
import torch.nn as nn

class DigitClassifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.layers = nn.Sequential(
            nn.Flatten(),
            nn.Linear(784, 256),   # 28x28 pixel image → 256 hidden neurons
            nn.ReLU(),
            nn.Linear(256, 128),   # 256 → 128
            nn.ReLU(),
            nn.Linear(128, 10),    # 128 → 10 output classes (digits 0-9)
        )

    def forward(self, x):
        return self.layers(x)

model = DigitClassifier()
# Total parameters: 784*256 + 256 + 256*128 + 128 + 128*10 + 10 = 235,146
# Each parameter is one weight that will be learned during training

After training on 60,000 examples, this small network classifies handwritten digits with 98%+ accuracy. A transformer with billions of parameters applies the same core principle to vastly more complex tasks.

Why this matters for understanding AI

Once you understand neural networks, the capabilities and limitations of modern AI make sense:

Why does it get better with more data? More training examples mean more iterations of the weight-adjustment loop, producing better-tuned weights.
Why is it hard to explain? There is no single “rule” to read out. Knowledge is distributed across millions of weights.
Why does it hallucinate? The network learned to predict plausible output from training patterns. It has no external fact-checking mechanism.
Why is training expensive? Millions of forward and backward passes through billions of parameters require thousands of specialised chips (GPUs/TPUs) running for weeks.

What’s next

What is Machine Learning? : The broader context for how neural networks are trained
What is a Large Language Model? : How transformer neural networks power text AI
What is Generative AI? : How neural networks are used to create content

TensorFlow Neural Network Playground : Interactive browser demo where you can watch a neural network learn in real time
But what is a neural network? (3Blue1Brown) : The clearest visual explanation available; highly recommended starting point
Neural Networks and Deep Learning (Nielsen) : Free online book, mathematically precise but accessible

Official documentation: TensorFlow Neural Network Playground (interactive)

Frequently asked questions

Why is it called a neural network?

The name comes from biological neurons in the brain. Biological neurons receive signals from other neurons, and if the combined signal is strong enough, they fire and pass a signal forward. Artificial neural networks loosely mimic this: each node (artificial neuron) receives numeric inputs, multiplies them by learned weights, adds them together, and produces an output. The similarity is at the conceptual level; artificial neural networks do not actually work like the brain in any biologically accurate sense.

What is the difference between a neural network and deep learning?

Deep learning is neural networks with many layers: typically more than two hidden layers. Early neural networks had one or two layers and struggled with complex tasks. Deep networks with many layers can learn hierarchical features: low layers detect edges, middle layers detect shapes, high layers detect objects. ‘Deep’ specifically refers to the depth (number of layers). ChatGPT runs on a very deep neural network with billions of parameters.

How does a neural network learn?

Through backpropagation. The network makes a prediction, compares it to the correct answer, calculates the error, and propagates that error backwards through the layers to adjust the weights. This adjustment is called a gradient descent step. Repeat this millions of times on millions of examples, and the weights gradually converge to values that produce correct predictions. The learning is entirely in the weight adjustments.

What is a transformer and how does it relate to neural networks?

A transformer is a specific neural network architecture introduced in 2017 that underlies all modern large language models. It uses a mechanism called self-attention to process all parts of a sequence simultaneously rather than step by step. GPT-4, Claude, Llama, and Gemini are all transformer neural networks. The transformer is the dominant architecture for language, image (Vision Transformer), and audio tasks.

How many neurons does a modern neural network have?

GPT-3 has 175 billion parameters (weights). Each parameter is a connection weight in the network. A network with 175 billion parameters has a rough equivalent of hundreds of billions of ‘connections’, vastly exceeding the 100-500 trillion synapses in the human brain in sheer number but utterly different in architecture and function.