🧠 Key Concept
A Bloom filter is a bit array of fixed size, with multiple hash functions. You use it to:
- Insert: Map an item to a few bit positions and set those to
1
. - Query: Check if all the bit positions for an item are
1
.
This means:
- If even one bit is
0
, the item is definitely not in the set. - If all bits are
1
, the item might be in the set (→ false positive possible).
⚙️ Structure:
- Bit array
bits[]
of sizem
(e.g., 10,000 bits, all initialized to 0). k
independent hash functions (e.g., 3 different hashes per item).
✅ Example — Step-by-Step
Let’s say:
- Bit array size
m = 10
- Number of hash functions
k = 3
- Hash functions output indexes from 0 to 9
1. Insert item “apple”
Suppose the 3 hash functions give:
hash1("apple") % 10 = 1
hash2("apple") % 10 = 4
hash3("apple") % 10 = 7
Now set bits at positions 1, 4, and 7:
Index: 0 1 2 3 4 5 6 7 8 9
Bits: 0 1 0 0 1 0 0 1 0 0
2. Insert item “grape”
Hash outputs:
hash1("grape") % 10 = 2
hash2("grape") % 10 = 4
hash3("grape") % 10 = 5
Set bits at positions 2, 4, and 5:
Index: 0 1 2 3 4 5 6 7 8 9
Bits: 0 1 1 0 1 1 0 1 0 0
3. Check if “apple” is in the set
Compute same hashes:
- 1, 4, 7 → bits are 1, 1, 1 → ✅ Might be in set (probably is)
4. Check if “peach” is in the set
Hash outputs:
hash1("peach") % 10 = 3
→ bit 3 is 0
So: ❌ “peach” is definitely not in the set
🔄 Why False Positives?
Now suppose you check “lemon” and the hash values happen to hit 1, 2, 4 — all of which are 1 due to other elements, not because of “lemon”. So:
→ Bloom filter returns “maybe in the set”, even though “lemon” was never added.
This is called a false positive.
🧮 Formula for False Positive Probability
The false positive rate can be approximated by:

Where:
n
= number of inserted elementsk
= number of hash functionsm
= number of bits
💡 Real World Usage:
- Web browsers: “Have I seen this URL before?”
- Databases: “Should I even try looking on disk?”
- Distributed systems: Avoid network requests for absent keys
- Blockchain: Bitcoin uses it to filter transactions
A Bloom filter is a space-efficient probabilistic data structure used to test whether an element is a member of a set. It is fast and memory-efficient, but allows false positives (it may say “yes” even if the element isn’t in the set) — however, false negatives are impossible (if it says “no”, the element is definitely not in the set).
🔧 How it works:
- A Bloom filter uses:
- A bit array of size
m
(initially all 0s). - k independent hash functions, each maps an input to one of the
m
bits.
- A bit array of size
- To add an item:
- Hash the item with each of the
k
hash functions. - Set each resulting bit position in the array to
1
.
- Hash the item with each of the
- To check if an item is in the set:
- Hash the item with the same
k
functions. - If all
k
corresponding bits are 1, the item may be in the set. - If any bit is 0, the item is definitely not in the set.
- Hash the item with the same
✅ Example:
- Add “apple” →
hash1
,hash2
, …, set bits 10, 23, 45 to 1. - Add “banana” → set bits 7, 23, 90.
- Check “apple” → bits 10, 23, 45 → all 1 → “maybe in set”
- Check “grape” → bit 18 is 0 → “definitely not in set”
⚖️ Pros and Cons:
Pros | Cons |
---|---|
Very space-efficient | Allows false positives |
Fast insert and lookup | Cannot delete elements (basic version) |
No need to store actual data | Needs tuning for optimal m and k |
🧠 Applications:
- Database systems (e.g., avoid unnecessary disk reads)
- Web caching
- Spell checkers
- Network intrusion detection
- Big data systems (e.g., Apache HBase, Apache Cassandra, Google Bigtable)