вторник, 22 юни 2010 г.

Open Hashtable Java Implementation

The third and last(for now) data structure I would like to present is hashtable with open addressing.Again I won't bother to explain in detail what is hashing, hashtable, chaining, open addressing. If these terms aren't familiar to you please take a look at these articles:

Hash Function

Hash table

When I read these articles I wondered how "good" can a hashtable implementing open addressing be?
Since Java provides hashtable with chaining I had the opportunity to compare the two types of tables. The implementation I am offering uses quadratic probing for collision resolution. It also stores its elements in nodes of a double linked list which provides faster iteration through keys/values.

Performance:
Again there is a set of tests comparing both hashtables.
Well the results aren't as good as I wanted them to be.
The chaining implementation is more consistent in its performance.

-Insert is almost equal for both implemetations
(Data type: integer)

table/num elements 10,000 50,000 250,000 500,000
Chaining 0ms 60ms 290ms 1122ms
Open Addressing 10ms 80ms 340ms 881ms


Testing with 1000000 elements and different load factors:
0.750.50.40.3
Chaining1221119211322093
Open Addressing1882167215431552


As you can see in order to perform better than a chained hashtable open addressing needs bigger table.This can be done by setting smaller load factor or bigger size of the table.

Searching:
For up to 50,000 elements both tables perform equally.For 500, 000 elements chaining has slight advantage.

Delete:
Again the same situation but for 500, 000 this time the advantage is bigger for chaining - about 2 times faster.

Get all keys/values:
This time Open Addressing is better due to the fact that each element is a node in double linked list so if you have n elements you'll need exactly n steps to get all keys/values.
For getting all values open addressing runs about two times faster.

Search for a value:
Although I don't think this is a typical operation for a hashtable both implementations offer it. In both implementations it is slow and you should use it with caution.
However in open addressing it runs much faster - 14.5 sec. vs 46sec in chaining for searching 20,000 values.

Conclusion:
In order to be as fast as chaining implementation open addressing needs much more space than the actual number of elements.Thus a lot of memory is wasted.
Open addressing cannot store more elements than its table size, while chaining has no such limitation.
If you check the wikipedia article some advantages of open addressing are mentioned there but for general purpose usage chaining is more robust and consistent.
Not to mention what a pain in the ass is to implement open addressing based hashtable :P
Of course there are cases where this implementation could be useful too.

Here is the source + tests

Няма коментари:

Публикуване на коментар