Monday, May 6, 2013

Cuckoo Hashing

I was taking a look at a library called Kryo that is being used on a project I'm working on. It promises to be a much faster way of de/serializing objects.

Trying to discover it's secrets, I came across a class called com.esotericsoftware.kryo.util.ObjectMap. It's an implementation of a map that claims to be very fast for get, containsKey and remove although it maybe slower for put.

The way it does this is by using a technique called Cuckoo hashing. Here, we have an array of keys and an array of values. Upon insertion, we calculate a hash for our key and see if the value slot is free. If it is, we insert the value and we're done. If not, we may repeat this process for N more times using a different hash. Finally, failing this, we check a fixed size of memory called the stash. I'll return to this in a moment.

If we have failed to find any space for our mapping to go, an old key/value mapping is evicted to make way for our new key/value pair. This is where there reference to cuckoos comes from (cuckoo chicks eject their foster siblings from the nest).

The old key/value pair is put into the stash I mentioned earlier. This is a fixed-size area of memory. Only when this is exhausted do the arrays resize.

Incidentally, because the stash is fixed-size, this algorithm qualifies as an in-place algorithm.

Running tests using both Kryo's ObjectMap and Java's HashMap each initialized with space for 10 000 slots and running the experiment 10 000 times gave these results (all times in milliseconds).

Kryo ObjectMap Java HashMap
Number of Elements Put Get Remove Put Get Remove
2000 636 96 126 379 295 353
4000 1372 401 704 716 569 709
8000 3384 1763 1628 2184 1269 1585

True to their word, the author appears to have implemented a very fast map when it come to get and remove. The only caveat is that as the data structure approaches any where near capacity, it slows down.

As an aside, the stash is searched using a technique called "linear probing" or "open addressing". Here "we just add the key to an [array] index but in linear probing, we put it in position i if that's free and if not look at position i+1, i+2... It works well if the size of the array is significantly bigger than the number of keys" [1].

[1] Prof Bob Sedgewick, Coursera.

1 comment:

  1. Thanks for taking a look. One of the advantages of cuckoo hashing is that it doesn't allocate, so it doesn't doesn't cause GC. There are no entry objects like in HashMap. This can be useful on Android and/or when writing games, where GC can cause hiccups.

    Here are a couple threads about maps:
    http://www.java-gaming.org/topics/map-performance/21395/view.html
    http://www.java-gaming.org/index.php/topic,23102

    ReplyDelete