[ACCEPTED]-What is a sensible prime for hashcode calculation?-primes

Accepted answer
Score: 84

I recommend using 92821. Here's why.

To give a 35 meaningful answer to this you have to know 34 something about the possible values of i and 33 j. The only thing I can think of in general 32 is, that in many cases small values will 31 be more common than large values. (The odds 30 of 15 appearing as a value in your program 29 are much better than, say, 438281923.) So 28 it seems a good idea to make the smallest 27 hashcode collision as large as possible 26 by choosing an appropriate prime. For 31 25 this rather bad - already for i=-1 and j=31 you 24 have the same hash value as for i=0 and j=0.

Since 23 this is interesting, I've written a little 22 program that searched the whole int range 21 for the best prime in this sense. That is, for 20 each prime I searched for the minimum value 19 of Math.abs(i) + Math.abs(j) over all values of i,j that have the same 18 hashcode as 0,0, and then took the prime where 17 this minimum value is as large as possible.

Drumroll: the 16 best prime in this sense is 486187739 (with 15 the smallest collision being i=-25486, j=67194). Nearly as 14 good and much easier to remember is 92821 13 with the smallest collision being i=-46272 and j=46016.

If you 12 give "small" another meaning and want to 11 be the minimum of Math.sqrt(i*i+j*j) for the collision as 10 large as possible, the results are a little 9 different: the best would be 1322837333 8 with i=-6815 and j=70091, but my favourite 92821 (smallest 7 collision -46272,46016) is again almost as good as the 6 best value.

I do acknowledge that it is quite 5 debatable whether these calculation make 4 much sense in practice. But I do think that 3 taking 92821 as prime makes much more sense 2 than 31, unless you have good reasons not 1 to.

Score: 6

Actually, if you take a prime so large that 7 it comes close to INT_MAX, you have the same problem 6 because of modulo arithmetic. If you expect 5 to hash mostly strings of length 2, perhaps 4 a prime near the square root of INT_MAX would be 3 best, if the strings you hash are longer 2 it doesn't matter so much and collisions 1 are unavoidable anyway...

Score: 5

Collisions may not be such a big issue... The 10 primary goal of the hash is to avoid using 9 equals for 1:1 comparisons. If you have 8 an implementation where equals is "generally" extremely 7 cheap for objects that have collided hashs, then 6 this is not an issue (at all).

In the end, what 5 is the best way of hashing depends on what 4 you are comparing. In the case of an int 3 pair (as in your example), using basic bitwise 2 operators could be sufficient (as using 1 & or ^).

Score: 4

You need to define your range for i and 1 j. You could use a prime number for both.

public int hashCode() {
   http://primes.utm.edu/curios/ ;)
   return 97654321 * i ^ 12356789 * j;
}
Score: 4

I'd choose 7243. Large enough to avoid collissions 2 with small numbers. Doesn't overflow to 1 small numbers quickly.

Score: 1

I just want to point out that hashcode has 3 nothing to do with prime. In JDK implementation 2

for (int i = 0; i < value.length; i++) {
                h = 31 * h + val[i];
            }

I found if you replace 31 with 27, the result 1 are very similar.

More Related questions