Database.What is Low Selectivity problem, when we speaking about indexes ?

Selectivity = the percentage of rows that match a given condition.

  • High selectivity: few rows match
    → e.g. WHERE email = 'alice@example.com'
    → ✅ index is efficient
  • Low selectivity: many rows match
    → e.g. WHERE gender = 'M' in a table where 60% are 'M'
    → ❌ index might be inefficient

🧠 Why Is Low Selectivity Bad for Index Use?

When many rows match, using an index means:

  1. The database finds matching row IDs in the index
  2. Then it has to jump to each row individually (called random I/O)
  3. This is slower than just reading the whole table sequentially in memory or disk

✅ Example:

Assume this table has 1 million rows:

SELECT * FROM Users WHERE is_active = true;

If 950,000 rows have is_active = true, that’s very low selectivity

The index on is_active matches most of the table

The database says:
❌ “Too many matches — don’t waste time jumping around.”
✅ “Just scan the whole table — it’s faster.”

Visual Analogy:

  • 📇 Index access is like flipping through a card catalog to find books, but running back and forth in the library for each one.
  • 📚 Full table scan is like walking down every shelf and grabbing all relevant books in order — more efficient if you’re taking most of them.

🧪 Rule of Thumb:

  • If a condition matches >10–20% of the table, many databases prefer a full scan over index access.
This entry was posted in Без рубрики. Bookmark the permalink.