Writing a good hash function is a crucial skill, especially if you’re designing custom hash-based data structures like hash tables. Here’s a breakdown of what makes a good hash function and how to write one:
🔑 What is a “good” hash function?
A good hash function should:
✅ Distribute keys evenly across the hash table (avoid clustering).
✅ Be fast to compute — ideally O(1)
time.
✅ Minimize collisions (different inputs should ideally map to different hash values).
✅ Deterministic — the same input must always produce the same hash.
✅ Work well for the expected set of keys (strings, integers, objects, etc.).
🛠️ Basic Recipe for a Good Hash Function
1. Start with a prime number seed.
Primes work better in modular arithmetic (common in hash tables). For example:
int hash = 17;
2. Combine the fields of your object into the hash.
For each field (like an integer, string, or nested object), update the hash using multiplication and addition:
hash = 31 * hash + field.hashCode();
🔷 31
is a common multiplier because it’s prime and works well in practice (other primes like 37, 53, 97 also work).
3. Handle null fields if necessary.
hash = 31 * hash + (field == null ? 0 : field.hashCode());
4. Use built-in hashes for basic types.
- For
int
,double
,boolean
— Java hashashCode()
implementations. - For arrays — loop through and combine hashes of each element.
- For strings —
String.hashCode()
is usually pretty good.
Example for a simple Person
class
class Person {
String name;
int age;
double height;
@Override
public int hashCode() {
int hash = 17; // seed
hash = 31 * hash + (name == null ? 0 : name.hashCode());
hash = 31 * hash + age;
long bits = Double.doubleToLongBits(height);
hash = 31 * hash + (int)(bits ^ (bits >>> 32)); // for double
return hash;
}
}
🔬 Key Principles Behind This
- Prime multipliers reduce patterns.
- Combining fields captures all data (each field contributes to uniqueness).
- Avoid simple addition or XOR only. Adding fields directly or using XOR can lead to more collisions if fields have low entropy (e.g., many zeros).
💡 Pro Tips
✅ For strings, use all characters (don’t just hash the first/last).
✅ If you know your data has certain patterns (like sequential IDs), use a hash function that breaks that pattern (e.g., bit-shifting or multiplying by primes).
✅ Benchmark and test your hash function on real data — if you see too many collisions, rethink the design.
🚨 What to avoid
❌ Simple modulo (key % tableSize
) without mixing.
❌ Using just one field if your object has several important fields.
❌ Overcomplicated hashing — keep it fast.
⚡ Bonus: When to Use Existing Hash Functions
For common needs (like strings, UUIDs, etc.), just use:
String.hashCode()
Objects.hash()
(for combining fields in modern Java)
Example:
@Override
public int hashCode() {
return Objects.hash(name, age, height);
}