Database.Advanced.What is Z-ordering?

Z-ordering (also called Z-indexing or Z-order curves) is a multi-dimensional indexing technique used in databases to optimize queries on multiple columns, especially for big data and time-series workloads.


🧠 What is Z-ordering?

Z-ordering transforms multiple dimensions (e.g., columns) into a single-dimensional value while preserving spatial locality.

Imagine plotting points on a 2D grid. Z-ordering arranges those points in a specific “zigzag” or “Z-pattern” to help store and query them more efficiently in one dimension (disk or memory).

It’s similar to a space-filling curve that flattens multi-column data in a way that keeps nearby points close in storage.


💡 Why Use It?

Z-ordering helps when:

  • You often query on multiple columns together (e.g., WHERE user_id = ? AND timestamp BETWEEN ...)
  • You want to minimize disk I/O or data scanned
  • Data is stored in columnar formats like Parquet, ORC, or Delta Lake

🔍 How It Works (Simplified)

  1. Choose multiple columns (e.g., device_id, timestamp)
  2. Each value is binary encoded
  3. Bits from each column are interleaved (like shuffling a deck)
  4. The resulting binary is used as the Z-value (Z-index)
  5. Data is sorted by Z-value and written to disk

This increases the chance that rows with similar values across multiple dimensions will be physically stored together.


📦 Used In:

System/EngineZ-ordering Role
Apache IcebergData skipping using Z-order clustering
Delta Lake (Databricks)OPTIMIZE ... ZORDER BY to improve query performance
ClickHouseUses similar multi-column sorting techniques
Druid / Presto / TrinoSupport advanced block pruning

🔬 Real Example: Delta Lake

This entry was posted in Без рубрики. Bookmark the permalink.