Database.Does it make sense to index data that has a small number of possible values?

Great question — the answer is:

Usually, no, it does not make sense to index columns with a small number of distinct values (i.e., low cardinality), unless there’s a very specific reason.


📉 Why Not?

Columns like:

gender = 'M' or 'F'
status = 'active' or 'inactive'
is_deleted = true or false

Have very few possible values, so:

  • Each index entry matches many rows
  • The database will likely do a full table scan anyway, because the index isn’t selective
  • Using the index can be slower than just scanning the table

🔍 When Might It Still Make Sense?

✅ 1. In combination with other columns (composite index)

CREATE INDEX idx_status_created ON Users(status, created_at);

Useful if you often query:

SELECT * FROM Users
WHERE status = 'active' AND created_at > '2024-01-01';

Here, status helps narrow the data before applying the range on created_at.

✅ 2. In large, filtered queries or partitions

If:

  • The table is huge, and
  • The low-cardinality column is highly skewed (e.g., 90% 'active', 10% 'inactive'), and
  • You often query the rare value:
SELECT * FROM Users WHERE status = 'inactive';

Then an index might help — but it depends on the query planner and row distribution.

✅ 3. Bitmap Indexes (in data warehouses)

In OLAP databases (e.g., Oracle, Redshift), low-cardinality columns are perfect for bitmap indexes, which compress well and are efficient for complex filtering.

🧠 Rule of Thumb:

CardinalityShould You Index?
🔴 Very low (e.g., true/false)❌ No — often worse performance
🟡 Medium⚠️ Depends on query patterns and size
🟢 High (e.g., email, ID)✅ Yes — ideal for indexing
This entry was posted in Без рубрики. Bookmark the permalink.