SQL.How DBMS chooses the plan ?

SELECT *
FROM orders o
JOIN users u ON o.user_id = u.id;

What the optimizer considers

1) Sizes and selectivity

  • If orders is huge and users is smaller → hash join is often cheapest (build hash on users, probe with orders).
  • If orders is filtered down to a few rows (e.g., WHERE o.id = ?) → nested loop is usually best.

2) Indexes

  • If there’s an index on orders.user_id and users.id and the planner expects few matching rows → nested loop + index lookup can win.
  • If both sides can be read in sorted order (via indexes) and planner thinks it’s cheaper than hashing → merge join can win.

3) Memory (work_mem) and spill risk

  • Hash join needs memory. If it expects spill to disk (big hash) it may prefer merge join (sort + merge) or nested loop.

4) Existing order

  • If one or both inputs are already coming out sorted for other reasons (e.g., index scan for a filter), merge join becomes attractive.

5) Statistics

If stats are stale/wrong, planner may pick a “weird” plan.


Typical outcomes in real life

✅ Most common for orders JOIN users in OLTP

Nested loop (especially if query is selective):

SELECT * FROM orders o
JOIN users u ON u.id = o.user_id
WHERE o.id = 123;

Plan idea:

  • Index lookup on orders.id → 1 row
  • For that row, index lookup users.id → 1 row
    Nested loop is perfect here.

✅ Common for big scans / analytics

Hash join:

  • Seq Scan orders
  • Hash users
  • Hash Join

✅ Sometimes happens

Merge join:

  • if both sides can be streamed sorted (or cheaply sorted)
  • or hash join is estimated expensive / memory constrained

How to prove it on your DB (Postgres)

Run:

EXPLAIN (ANALYZE, BUFFERS)
SELECT *
FROM orders o
JOIN users u ON o.user_id = u.id;

You’ll see one of:

  • Hash Join
  • Nested Loop
  • Merge Join

To test alternatives:

SET enable_mergejoin = off;
SET enable_hashjoin  = off;
SET enable_nestloop  = off;

(only for experiments, not production)


Interview-ready phrasing

“Equality join doesn’t imply hash join. Hash join is one possible physical algorithm. Postgres may choose nested loop if it expects few rows and has indexes, or merge join if inputs are sorted or hashing is costly. The choice depends on statistics, indexes, and memory.”

This entry was posted in Без рубрики. Bookmark the permalink.