Relational Algebra

The formal mathematical foundation for querying relational databases, defining operations like select, project, and join that underpin SQL and query optimization.

Added 28 Mar 2026 3 min read Updated 30 May 2026

#databases #relational-algebra #query-optimization #sql #relational-model

Learn this your way

Read Guided course

Relational algebra is a procedural query language that operates on relations (tables) and produces relations as output. It provides the theoretical foundation for SQL and serves as the internal representation that database query optimizers use to evaluate and transform queries before execution.

Fundamental Operations

Selection (sigma) filters rows from a relation based on a predicate. It takes a relation and a condition and returns only the rows that satisfy that condition. In SQL terms, selection corresponds to the WHERE clause.

Projection (pi) filters columns, returning a relation with only the specified attributes. Duplicate rows are eliminated in the formal definition. This corresponds to the SELECT column list in SQL.

Union combines two union-compatible relations (same number and types of columns) and returns all rows from both, removing duplicates.

Set Difference returns rows that appear in the first relation but not in the second. Both relations must be union-compatible.

Cartesian Product (cross product) combines every row of one relation with every row of another, producing a relation with all columns from both. This is rarely used alone but serves as the building block for joins.

Rename (rho) changes the name of a relation or its attributes, enabling self-joins and disambiguation.

Derived Operations

Join is the most heavily used derived operation. A natural join combines a Cartesian product with a selection on matching column names. Theta joins allow arbitrary comparison predicates. Equi-joins restrict the predicate to equality. Left, right, and full outer joins preserve unmatched rows from one or both relations.

Intersection returns rows common to two union-compatible relations and can be expressed as a combination of set difference operations.

Division answers queries of the form “find all X that are related to every Y” and is useful for universal quantification problems.

Role in Query Optimization

Database engines parse SQL into a relational algebra expression tree, then apply equivalence rules to transform it into more efficient forms. Pushing selections down the tree to reduce intermediate result sizes, reordering joins to minimize cost, and combining projections are all algebraic transformations that improve query performance without changing the result.

Origins and History

E.F. Codd defined relational algebra alongside the relational model in his 1970 paper [1]. He established that relational algebra and relational calculus (a nonprocedural alternative) are equivalent in expressive power, a property known as relational completeness. This work provided the mathematical rigor that distinguished relational databases from earlier hierarchical and network models.

Sources

Codd, E.F. (1970). “A Relational Model of Data for Large Shared Data Banks.” Communications of the ACM, 13(6), 377-387.
Codd, E.F. (1972). “Relational Completeness of Data Base Sublanguages.” IBM Research Report RJ987.

Open source projects

Freelancer Templates Contracts, proposals, SOWs

Freelancer Automation Workflow recipes, AI playbooks

Work with Linda

Workshop Series €2,000/mo x 3

1:1 Consulting 60 min session