Z-ordering (also called Z-indexing or Z-order curves) is a multi-dimensional indexing technique used in databases to optimize queries on multiple columns, especially for big data and time-series workloads.
🧠 What is Z-ordering?
Z-ordering transforms multiple dimensions (e.g., columns) into a single-dimensional value while preserving spatial locality.
Imagine plotting points on a 2D grid. Z-ordering arranges those points in a specific “zigzag” or “Z-pattern” to help store and query them more efficiently in one dimension (disk or memory).
It’s similar to a space-filling curve that flattens multi-column data in a way that keeps nearby points close in storage.
💡 Why Use It?
Z-ordering helps when:
- You often query on multiple columns together (e.g.,
WHERE user_id = ? AND timestamp BETWEEN ...
) - You want to minimize disk I/O or data scanned
- Data is stored in columnar formats like Parquet, ORC, or Delta Lake
🔍 How It Works (Simplified)
- Choose multiple columns (e.g.,
device_id
,timestamp
) - Each value is binary encoded
- Bits from each column are interleaved (like shuffling a deck)
- The resulting binary is used as the Z-value (Z-index)
- Data is sorted by Z-value and written to disk
This increases the chance that rows with similar values across multiple dimensions will be physically stored together.
📦 Used In:
System/Engine | Z-ordering Role |
---|---|
Apache Iceberg | Data skipping using Z-order clustering |
Delta Lake (Databricks) | OPTIMIZE ... ZORDER BY to improve query performance |
ClickHouse | Uses similar multi-column sorting techniques |
Druid / Presto / Trino | Support advanced block pruning |