вторник, 22 юни 2010 г.

Open Hashtable Java Implementation

The third and last(for now) data structure I would like to present is hashtable with open addressing.Again I won't bother to explain in detail what is hashing, hashtable, chaining, open addressing. If these terms aren't familiar to you please take a look at these articles:

Hash Function

Hash table

When I read these articles I wondered how "good" can a hashtable implementing open addressing be?
Since Java provides hashtable with chaining I had the opportunity to compare the two types of tables. The implementation I am offering uses quadratic probing for collision resolution. It also stores its elements in nodes of a double linked list which provides faster iteration through keys/values.

Performance:
Again there is a set of tests comparing both hashtables.
Well the results aren't as good as I wanted them to be.
The chaining implementation is more consistent in its performance.

-Insert is almost equal for both implemetations
(Data type: integer)

table/num elements 10,000 50,000 250,000 500,000
Chaining 0ms 60ms 290ms 1122ms
Open Addressing 10ms 80ms 340ms 881ms


Testing with 1000000 elements and different load factors:
0.750.50.40.3
Chaining1221119211322093
Open Addressing1882167215431552


As you can see in order to perform better than a chained hashtable open addressing needs bigger table.This can be done by setting smaller load factor or bigger size of the table.

Searching:
For up to 50,000 elements both tables perform equally.For 500, 000 elements chaining has slight advantage.

Delete:
Again the same situation but for 500, 000 this time the advantage is bigger for chaining - about 2 times faster.

Get all keys/values:
This time Open Addressing is better due to the fact that each element is a node in double linked list so if you have n elements you'll need exactly n steps to get all keys/values.
For getting all values open addressing runs about two times faster.

Search for a value:
Although I don't think this is a typical operation for a hashtable both implementations offer it. In both implementations it is slow and you should use it with caution.
However in open addressing it runs much faster - 14.5 sec. vs 46sec in chaining for searching 20,000 values.

Conclusion:
In order to be as fast as chaining implementation open addressing needs much more space than the actual number of elements.Thus a lot of memory is wasted.
Open addressing cannot store more elements than its table size, while chaining has no such limitation.
If you check the wikipedia article some advantages of open addressing are mentioned there but for general purpose usage chaining is more robust and consistent.
Not to mention what a pain in the ass is to implement open addressing based hashtable :P
Of course there are cases where this implementation could be useful too.

Here is the source + tests

понеделник, 21 юни 2010 г.

AA-Tree Java Implementation

The next data structure presented is called AA tree.
Briefly this is a self balancing binary tree.Its performance is comparable with the performance of red-black trees though it is slightly slower during tests.


The wikipedia article explains this type of trees well enough:
AA-Tree

AA Tree demo


The Java implementation I offer follows the algorithms explained in the article above so I didn't prepare any additional documentation. The standard search/insert/delete functionality is provided. The tests provided in the package compare this implementation with Java's TreeSet which keeps elements in red-black tree. The performance tests show that both implementations have similar running times:
- searching is similar(barely noticeable advantage for aa-tree)
- insert is slightly faster for red-black trees
- delete is faster for red-black trees(my tests show about two times advantage for TreeSet)

So as you can see Red-Black Tree is clearly the winner here.
This is not surprising - after all this is one of the most popular trees in many libraries in different languages.

So what is the advantage of AA tree over Red-Black tree?
Besides the slight advantage in searching maybe the biggest advantage is that AA trees are simple to implement.The implementation I offer is about 170 lines which is less than any red-black tree implementation I've seen so far.

Java source + tests

Fibonacci Heap Java Implementation

Intro to heaps:
Briefly said heap is a data structure like stack, linked list s.o.

Features:
Usually you should be able to to perform the following operations "quickly":
- get min/max element
- add new element
- join two heaps
- decrease a key of element

Disadvantages:
- heaps cannot perform fast search operations. The only way for most classic implementations of a heap to perform a search is to scan all of the elements which takes linear time.

There are many variants of a heaps. Here you can find more about heaps:
Heaps Wikipedia Article

Fibonacci Heap:
Fibonacci Heap Wikipedia Article
As you can see Fibonacci heap offers faster running times compared to many of the other heaps.

So enough said about the structure. Here is a implementation in Java 1.6 of the Fibonacci heap.It supports the following operations - insert, extract min, decrease key. This is all you need if you want to implement Dijkstra's shortest path algorithm for instance. The operations union and delete min are not added but if you need you can add them easily since they are trivial.
A drawback of this implementation(and probably any implementation of Fibonacci heap) is the high constant compared to array based binary heap for example.



API summary:

Maybe you will find the API offered a bit unclear and strange.I agree with that but here is the reason it looks like this - as we mentioned there is no fast way to find element in a heap.So if we want to decrease key of random element effectively we cannot afford to waste time searching for this element. We should have reference to it and pass it directly to the decreaseKey method.
These references to elements in the heap are obtained via the insert method which gives you reference to the newly added element.

So if you do not intend to use the decrease key operation you can write you own wrapper class and provide much more user friendly api design.

API Javadoc


Tests:

There are some JUnit tests provided which test almost all of the functionality.Code coverage is about 95%.


Performance:

As mentioned fibonacci heap implementations have big constant.
This implementation could be slightly improved by modifying it not to use head and tail references for the main list. Only reference to the min element is sufficient.This might save some time.In the test package you can also find performance test comparing java PriorityQueue(which implements binary heap) with this implementation. You can play with different parameters for tests and check out how both heaps behave.
---------
Note: If you want to test for inserting/extracting a few million elements you might have to set this VM arguments:
vm args -Xms128m -Xmx512m (in Eclipse this is done in the running configurations of the test)
---------
Results vary for different parameters so it really depends on the use case to tell which one is better.
PriorityQueue doesn't support decrease key operation so it cannot be compared but if it supported it should be supposedly slower since its running time is O(log n) while this implementation of decrease key has running time O(1) amortized.
Union operation is also much faster for Fibonacci heaps - O(1) while it takes O(n) for binary heaps.So as I mentioned adding it to this implementation is a piece of cake and if you use it you will get much better performance.

So finally here is the full source with the tests:

Fibonacci Heap source + tests

If you find bugs or ways to make it faster feel free to tell me.

петък, 11 юни 2010 г.

Jigsaw Puzzle

Here you can check out a jigsaw puzzle implemented in Flex 3.5.
It generates vector based puzzle pieces on the fly so every time you start it you get a new and unique puzzle pieces.



The main focus is not on menus and gameplay so they could be greatly improved.The goal is to demonstrate the power of flex bitmap handling api.



Of course there is some gameplay :) You can join pieces or groups of pieces. There are different levels of difficulty, different piece sizes, game scores ...



PLAY GAME

The source is also available.I'm not a flex/actionscript developer so some of the good practices may have been violated.This is my first flex project so I will be glad to get some feedback.

Here is the complete source(exported flex builder project)
SOURCE CODE

Any questions and sensible comments will be appreciated.