Database.Middle.How does a B-tree index work?

A B-tree index (short for Balanced Tree) is the most common type of index used in relational databases (e.g., PostgreSQL, MySQL InnoDB, Oracle). It organizes data in a way that allows fast lookup, insertion, deletion, and range queries, all in logarithmic time.

Let’s break down how it works in simple terms:


🌳 1. Structure of a B-tree

A B-tree is a balanced multi-level tree with these key properties:

  • Sorted: Keys in each node are ordered.
  • Balanced: All leaf nodes are at the same level.
  • Multi-way: Each node can have many children, not just 2 (unlike binary trees).
  • Disk-optimized: Designed to reduce disk reads/writes by grouping many keys in each node.

📚 Example (Simplified)

Suppose you have a B-tree index on column email:

         [anna, dave, kate]
        /       |       |     \
    <anna   anna–dave dave–kate >kate

Each node has:

  • Keys (sorted)
  • Pointers to children (where to go based on key comparison)

🔍 2. How Lookup Works

SELECT * FROM users WHERE email = 'dave@example.com';

  1. Start at root node
  2. Find the range where 'dave@example.com' fits
  3. Follow pointer to the correct child node
  4. Repeat until leaf node
  5. Get pointer to actual row in the table (if non-clustered)

This takes O(log n) time.


📈 3. How Insert Works

  1. Navigate to the correct leaf node
  2. Insert key in sorted order
  3. If node is full, split and promote middle key to parent
  4. Tree remains balanced

🔁 4. How Range Queries Work

For:

SELECT * FROM orders WHERE created_at BETWEEN '2024-01-01' AND '2024-01-31';
  1. B-tree quickly locates the starting key
  2. Then performs a sequential scan across leaf nodes (they’re linked like a doubly linked list)
  3. Stops when upper bound is reached

This is why B-tree indexes are great for BETWEEN, >, <, >=, <=, and ORDER BY`.


🔧 Summary of Operations

OperationTime Complexity
Lookup (point)O(log n)
Insert/DeleteO(log n)
Range scanO(log n + k) (where k = rows returned)

🧠 Why B-trees Work So Well

  • Nodes are designed to match disk block sizes, minimizing disk I/O
  • Balance ensures predictable performance even with millions of rows
  • Self-adjusting on insert/delete
This entry was posted in Без рубрики. Bookmark the permalink.

Leave a Reply

Your email address will not be published.