In 1937 Claude Shannon showed that electrical switching circuits can carry out Boolean logic, the on-or-off algebra of true and false. In 1948 his paper A Mathematical Theory of Communication defined information as a measurable quantity, introduced the bit, and set hard limits on reliable communication. Together these two works are the mathematical bedrock of the digital age.

Photograph of Claude Shannon
Photograph of Claude Shannon. CC BY 2.0 · Unknown author · source

What it was

Shannon’s contribution comes in two parts, separated by eleven years.

The first is his 1937 master’s thesis at MIT, titled A Symbolic Analysis of Relay and Switching Circuits. At the time engineers wired telephone exchanges from relays, electromechanical switches that click open or closed. They designed these circuits by intuition and testing. Shannon noticed that a switch has two states, open or closed, and that this matches the two values of Boolean algebra, true or false. He showed you can describe any switching network as a logic equation, then reduce that equation to build a smaller, cheaper circuit.

The second part is the 1948 paper. Shannon asked a basic question: what is information, measured precisely? His answer was to count surprise. A message you could easily predict carries little information. A message that resolves real uncertainty carries a lot. He measured this in bits, where one bit is a single yes or no.

Think of a coin toss. Before the toss you do not know the result. The outcome, heads or tails, resolves exactly one bit of uncertainty. A weather forecast for a place where it always rains carries almost no information. A forecast for a place that swings between sun and storm carries much more.

Step 1SourceA message is produced, carrying some amount of information measured in bits.
Step 2EncodeA transmitter turns the message into a signal, often compressed and protected against errors.
Step 3ChannelThe signal crosses a noisy medium with a fixed maximum capacity.
Step 4DecodeA receiver reconstructs the original message, correcting errors where it can.

Why it mattered

The 1937 thesis is often called the most important master’s thesis of the twentieth century. It gave engineers a way to design digital logic with mathematics rather than guesswork. Every logic gate, every processor, and every digital chip traces its design discipline back to this idea. Boolean algebra, an abstract system from the 1850s, suddenly had a physical home in switches and wires.

The 1948 paper founded an entire field, information theory. Before it, communication was an engineering craft full of rules of thumb. After it, communication had a science with exact quantities and provable limits.

Shannon’s deepest result is the channel capacity theorem, also called the noisy-channel coding theorem. He proved that every channel, however noisy, has a fixed capacity. Below that rate you can drive errors as close to zero as you wish, given clever enough coding. Above it, errors become unavoidable. This told engineers that near-perfect communication over an imperfect line is possible, and exactly how fast they could push.

He also separated two jobs that had been tangled together: compression, removing the predictable parts of a message, and error correction, adding controlled redundancy to survive noise. Treating these as distinct problems shaped decades of progress.

How it connects to AI today

Shannon’s work sits underneath almost everything in modern computing and AI, often invisibly.

The switching idea became the digital logic that runs every machine. A transistor is a tiny switch, on or off. Billions of them form the logic gates inside a CPU or a GPU. When a graphics card trains a neural network, it executes the same Boolean operations Shannon mapped onto relays in 1937, now at a scale of trillions per second.

The bit is the universal unit. Model weights, training data, and prompts are all bits. When you read that a large language model has 70 billion parameters stored at 16 bits each, you are quoting Shannon’s measure directly. Quantisation, the trick of shrinking a model by storing weights in fewer bits, is an exercise in trading precision against information content.

Information theory shapes how models learn. The standard training objective for language models is cross-entropy loss, which measures, in bits, how surprised the model is by the next token. A model that predicts well has low surprise and low loss. This is Shannon’s entropy applied directly. Perplexity, the headline metric for language models, is two raised to that entropy, a Shannon quantity by another name.

Compression and intelligence are now seen as close cousins. Predicting the next word well means modelling the structure of language, which means compressing it. Shannon’s source coding theorem set the floor for how far any compressor, neural or classical, can go.

A builder meets these ideas constantly. Every file you store and every token counted against an API limit is a bit count. Error-correcting codes from Shannon’s framework protect data in SSDs, mobile signals, QR codes, and deep-space links. The Wi-Fi and 5G that carry your model’s responses run as close to channel capacity as engineers can manage.

Still in use today

Shannon’s two contributions are foundational milestones that remain fully active, not legacy curiosities.

Boolean switching logic was never replaced. It was scaled. Relays gave way to vacuum tubes, then transistors, then integrated circuits, but the underlying logic is identical. The on-or-off switch is still the atom of digital hardware.

Information theory is a living field. Its core theorems are settled and taught in every electrical engineering and computer science programme. They guide the design of modems, storage media, and the codes that make streaming and video calls possible. Modern standards like 5G and Wi-Fi 6 lean on coding schemes, such as LDPC and turbo codes, that approach the Shannon limit he proved in 1948.

The concepts persist because they describe limits set by mathematics, not by any one technology. A channel capacity does not become outdated when hardware improves. The hardware only gets closer to the limit Shannon already named. That permanence ranks his work among the deepest results in engineering.

Shannon is widely called the father of information theory, and the unit of information, the shannon, carries his name alongside the more common bit.

Further reading