Sunday, July 7, 2013

AVL Trees

An interesting problem was presented to us of determining the 90th percentile of a moving window of stress test timings. It must do it efficiently and without running out of memory (thousands of samples per second over the course of hours or days).

My solution was to use a tree-like structure that was the underlying container for an ordered bag. A subset of all data can live in memory - say 100 data points. We know the 90th percentile will start at the node where 90 smaller points are to the left and the remainder are to the right.

(We maintain a sliding window by removing the oldest point before we add the next.)

The tree structure I chose was an AVL tree. "An AVL tree is a BST [binary search tree] where the height of every node and that of its sibling differ by at most 1." [1]

First a quick overview of Binary Search Trees.

2-3 Search Tree

"A 2-3 search tree is a tree that is either empty or

  • A 2-node, with one key (and associated value) and two links, a left link to a 2-3 search tree with smaller keys, and a right link to a 2-3 search tree with larger keys.
  • A 3-node, with two keys (and associated values) and three links,  a 2-3 search tree with smaller keys, a middle link to a 2-3 search tree with keys between the node's keys, and a right link to a 2-3 search tree with larger keys." [1]
(See the diagram below).


Red-Black Tree

Perhaps the most famous Binary Search Tree is the Red-Black tree. This is the underlying implementation of Java's TreeMap.

From Sedgewick's Algorithms:

  • Red links lean left
  • No node has two red links connected to it.
  • The tree has perfect black balance: every path from the root to a null link has the same number of black links.
"There is a 1-1 correspondence between red-black BSTs defined in this way and 2-3 trees."

Isomorphic mapping between a Red-Black tree and a 2-3 tree.

AVL Trees

The first configurations to consider are left-left:

Added 50, 25 then 10
25 swaps places with its parent who in turn becomes its right child.

left-left balanced
and right-right:

Added 50, 75 then 90
75 swaps places with its parent who then becomes its left child

Right-right balanced
This considers the case of a left-leaning and a right-leaning configuration. But what if we have a left-right

Added 50, 25 then 40

 and a right-left addition of numbers?

Added 50, 75, then 60
Well, for left-right, 40 comes up and swaps places with its parent:

Partial rebalancing of left-right

and for right-left, 60 comes up and swaps places with its parent:

Partial rebalancing of right-left

After that, they are now back to a left-left and a right-right configuration so we continue as described above.

(Graphics generated by QMatica's excellent interactive AVL-tree demo).

[1] Sedgewick's Algorithms.

No comments:

Post a Comment