SELECT *
FROM orders o
JOIN users u ON o.user_id = u.id;
What the optimizer considers
1) Sizes and selectivity
- If
ordersis huge andusersis smaller → hash join is often cheapest (build hash onusers, probe withorders). - If
ordersis filtered down to a few rows (e.g.,WHERE o.id = ?) → nested loop is usually best.
2) Indexes
- If there’s an index on
orders.user_idandusers.idand the planner expects few matching rows → nested loop + index lookup can win. - If both sides can be read in sorted order (via indexes) and planner thinks it’s cheaper than hashing → merge join can win.
3) Memory (work_mem) and spill risk
- Hash join needs memory. If it expects spill to disk (big hash) it may prefer merge join (sort + merge) or nested loop.
4) Existing order
- If one or both inputs are already coming out sorted for other reasons (e.g., index scan for a filter), merge join becomes attractive.
5) Statistics
If stats are stale/wrong, planner may pick a “weird” plan.
Typical outcomes in real life
✅ Most common for orders JOIN users in OLTP
Nested loop (especially if query is selective):
SELECT * FROM orders o
JOIN users u ON u.id = o.user_id
WHERE o.id = 123;
Plan idea:
- Index lookup on
orders.id→ 1 row - For that row, index lookup
users.id→ 1 row
Nested loop is perfect here.
✅ Common for big scans / analytics
Hash join:
Seq Scan ordersHash usersHash Join
✅ Sometimes happens
Merge join:
- if both sides can be streamed sorted (or cheaply sorted)
- or hash join is estimated expensive / memory constrained
How to prove it on your DB (Postgres)
Run:
EXPLAIN (ANALYZE, BUFFERS)
SELECT *
FROM orders o
JOIN users u ON o.user_id = u.id;
You’ll see one of:
Hash JoinNested LoopMerge Join
To test alternatives:
SET enable_mergejoin = off;
SET enable_hashjoin = off;
SET enable_nestloop = off;
(only for experiments, not production)
Interview-ready phrasing
“Equality join doesn’t imply hash join. Hash join is one possible physical algorithm. Postgres may choose nested loop if it expects few rows and has indexes, or merge join if inputs are sorted or hashing is costly. The choice depends on statistics, indexes, and memory.”