Short definition (interview-ready)
Join algorithms are the physical strategies a database engine uses to execute a JOIN efficiently, depending on data size, indexes, memory, and join condition.
Logical JOIN (INNER / LEFT / …) ≠ how it is executed.
The Big Three (must-know)
1️⃣ Nested Loop Join (NLJ)
How it works
for each row in A:
for each row in B:
if join_condition matches:
emit row
Variants
- Simple nested loop
- Index nested loop (very important)
When used
- Small tables
- One side is tiny
- Index exists on join column
Cost
- O(N × M) (bad without index)
- With index: O(N × log M)
Typical usage
- OLTP queries
- PK–FK joins with index
Interview insight
Nested loop is slow only if the inner side is not indexed.
2️⃣ Hash Join (workhorse of analytics)
How it works
- Build hash table from smaller input
- Probe with rows from larger input
build hash(B.key → row)
for each row in A:
lookup hash(A.key)
When used
- Large tables
- Equality joins (
=only) - No useful indexes
Cost
- O(N + M)
- Needs memory (or spills to disk)
Cannot be used for
<,>, range joins- Non-equality predicates
Typical usage
- Data warehouses
- Big scans
- Batch analytics
Interview insight
Hash join is fast but memory-hungry.
3️⃣ Merge Join (sorted join)
How it works
- Both inputs are sorted by join key
- Walk through them like merge-sort
A: 1 3 5 7
B: 3 4 5 8
→ walk in order
When used
- Inputs already sorted
- Index scan provides sorted order
- Range joins supported
Cost
- O(N + M)
- Sorting cost if not already sorted
Typical usage
- When both sides have B-tree indexes
- Large ordered scans
Interview insight
Merge join is great when data is already ordered — otherwise sorting kills it.
Summary table
| Algorithm | Needs Index | Needs Memory | Supports = | Supports <,> | Typical Use |
|---|---|---|---|---|---|
| Nested Loop | Optional | Low | ✅ | ✅ | OLTP, small data |
| Hash Join | ❌ | High | ✅ | ❌ | Large analytics |
| Merge Join | Optional | Medium | ✅ | ✅ | Ordered data |
How the optimizer chooses (important)
The query planner considers:
- Table sizes (statistics)
- Index availability
- Join condition (
=vs range) - Memory (
work_mem,join_buffer) - Cost estimates
👉 You don’t choose the algorithm directly (usually).
You influence it via:
- Indexes
- Query shape
- Hints (some DBs)
- Statistics freshness
Real backend implications (this is where seniors shine)
🔴 Bad schema → bad join algorithm
- Missing index → hash join instead of nested loop
- Nullable FK → broken plans
- Wrong statistics → terrible join order
🔴 ORM-generated SQL
- Too many joins → hash join explosion
- Accidental cartesian joins
- Unpredictable execution plans
PostgreSQL-specific notes (bonus)
Postgres supports:
- Nested Loop
- Hash Join
- Merge Join
You can see which one is used via:
EXPLAIN ANALYZE
And even disable one for testing:
SET enable_hashjoin = off;
Interview-ready final answer (say this smoothly)
“JOIN algorithms are the physical execution strategies used by the database engine.
The main ones are nested loop, hash join, and merge join.
Nested loop is good for small or indexed joins, hash join is efficient for large equality joins but uses memory, and merge join works well when inputs are already sorted.
The optimizer chooses based on statistics, indexes, and join conditions.”