Java.Core.How to write a good hash function ?

Writing a good hash function is a crucial skill, especially if you’re designing custom hash-based data structures like hash tables. Here’s a breakdown of what makes a good hash function and how to write one:


🔑 What is a “good” hash function?

A good hash function should:

Distribute keys evenly across the hash table (avoid clustering).
Be fast to compute — ideally O(1) time.
Minimize collisions (different inputs should ideally map to different hash values).
Deterministic — the same input must always produce the same hash.
✅ Work well for the expected set of keys (strings, integers, objects, etc.).


🛠️ Basic Recipe for a Good Hash Function

1. Start with a prime number seed.

Primes work better in modular arithmetic (common in hash tables). For example:

int hash = 17;

2. Combine the fields of your object into the hash.

For each field (like an integer, string, or nested object), update the hash using multiplication and addition:

hash = 31 * hash + field.hashCode();

🔷 31 is a common multiplier because it’s prime and works well in practice (other primes like 37, 53, 97 also work).


3. Handle null fields if necessary.

hash = 31 * hash + (field == null ? 0 : field.hashCode());

4. Use built-in hashes for basic types.

  • For int, double, boolean — Java has hashCode() implementations.
  • For arrays — loop through and combine hashes of each element.
  • For strings — String.hashCode() is usually pretty good.

Example for a simple Person class





class Person {
    String name;
    int age;
    double height;

    @Override
    public int hashCode() {
        int hash = 17;  // seed
        hash = 31 * hash + (name == null ? 0 : name.hashCode());
        hash = 31 * hash + age;
        long bits = Double.doubleToLongBits(height);
        hash = 31 * hash + (int)(bits ^ (bits >>> 32));  // for double
        return hash;
    }
}

🔬 Key Principles Behind This

  • Prime multipliers reduce patterns.
  • Combining fields captures all data (each field contributes to uniqueness).
  • Avoid simple addition or XOR only. Adding fields directly or using XOR can lead to more collisions if fields have low entropy (e.g., many zeros).

💡 Pro Tips

✅ For strings, use all characters (don’t just hash the first/last).
✅ If you know your data has certain patterns (like sequential IDs), use a hash function that breaks that pattern (e.g., bit-shifting or multiplying by primes).
✅ Benchmark and test your hash function on real data — if you see too many collisions, rethink the design.


🚨 What to avoid

❌ Simple modulo (key % tableSize) without mixing.
❌ Using just one field if your object has several important fields.
❌ Overcomplicated hashing — keep it fast.


⚡ Bonus: When to Use Existing Hash Functions

For common needs (like strings, UUIDs, etc.), just use:

  • String.hashCode()
  • Objects.hash() (for combining fields in modern Java)

Example:

@Override
public int hashCode() {
    return Objects.hash(name, age, height);
}
This entry was posted in Без рубрики. Bookmark the permalink.