An interesting problem was presented to us of determining the 90th percentile of a moving window of stress test timings. It must do it efficiently and without running out of memory (thousands of samples per second over the course of hours or days).
My solution was to use a tree-like structure that was the underlying container for an ordered bag. A subset of all data can live in memory - say 100 data points. We know the 90th percentile will start at the node where 90 smaller points are to the left and the remainder are to the right.
(We maintain a sliding window by removing the oldest point before we add the next.)
The tree structure I chose was an AVL tree. "An AVL tree is a BST [binary search tree] where the height of every node and that of its sibling differ by at most 1." [1]
First a quick overview of Binary Search Trees.
2-3 Search Tree
"A 2-3 search tree is a tree that is either empty or
Red-Black Tree
Perhaps the most famous Binary Search Tree is the Red-Black tree. This is the underlying implementation of Java's TreeMap.
From Sedgewick's Algorithms:
25 swaps places with its parent who in turn becomes its right child.
My solution was to use a tree-like structure that was the underlying container for an ordered bag. A subset of all data can live in memory - say 100 data points. We know the 90th percentile will start at the node where 90 smaller points are to the left and the remainder are to the right.
(We maintain a sliding window by removing the oldest point before we add the next.)
The tree structure I chose was an AVL tree. "An AVL tree is a BST [binary search tree] where the height of every node and that of its sibling differ by at most 1." [1]
First a quick overview of Binary Search Trees.
2-3 Search Tree
"A 2-3 search tree is a tree that is either empty or
- A 2-node, with one key (and associated value) and two links, a left link to a 2-3 search tree with smaller keys, and a right link to a 2-3 search tree with larger keys.
- A 3-node, with two keys (and associated values) and three links, a 2-3 search tree with smaller keys, a middle link to a 2-3 search tree with keys between the node's keys, and a right link to a 2-3 search tree with larger keys." [1]
(See the diagram below).
Red-Black Tree
Perhaps the most famous Binary Search Tree is the Red-Black tree. This is the underlying implementation of Java's TreeMap.
From Sedgewick's Algorithms:
- Red links lean left
- No node has two red links connected to it.
- The tree has perfect black balance: every path from the root to a null link has the same number of black links.
"There is a 1-1 correspondence between red-black BSTs defined in this way and 2-3 trees."
Isomorphic mapping between a Red-Black tree and a 2-3 tree. |
AVL Trees
The first configurations to consider are left-left:
Added 50, 25 then 10 |
left-left balanced |
and right-right:
Added 50, 75 then 90 |
75 swaps places with its parent who then becomes its left child
Right-right balanced |
This considers the case of a left-leaning and a right-leaning configuration. But what if we have a left-right
Added 50, 25 then 40 |
and a right-left addition of numbers?
Added 50, 75, then 60 |
Well, for left-right, 40 comes up and swaps places with its parent:
Partial rebalancing of left-right |
and for right-left, 60 comes up and swaps places with its parent:
Partial rebalancing of right-left |
After that, they are now back to a left-left and a right-right configuration so we continue as described above.
(Graphics generated by QMatica's excellent interactive AVL-tree demo).
(Graphics generated by QMatica's excellent interactive AVL-tree demo).
No comments:
Post a Comment