## Sunday, November 29, 2015

### Algorithm Recipes

These are some miscellaneous heuristics for writing and choosing algorithms that I have found useful.

Writing an Algorithm

1. If you want to make an algorithm tail recursive, add an accumulator to the method signature.

2. When you're thinking about the signature of a recursive algorithm, start with an argument that represents the "works still to be done" and an accumulator. You may need more, but that's a good start.

When the motivation for an algorithm is to find the optimal answer, one approach might be to recurse through the solution space finding the maximum (or minimum) of any two recursive sub-solutions. A good example is here (with a good Wikipedia explanation here). This is a C implementation of the Longest Common Subsequence where all sequences are recursively explored but only the maximum of the two sub-problems is chosen at each recursion.

Choosing an Algorithm

If you're stuck, consider sorting the data first. This may (or may not) help you move forward. For example, in the Closest Pair algorithm (where you want to find the closest pair of points in 2-D without incurring O(n * m) cost) we first sort all the points in the X-axis to help us find the closest pair. They're unlikely to be the closest pair in the whole plane but it's a start.

If the performance is O(n2), consider expressing the problem in terms of divide-and-conquer.

You cannot do better than O(n log(n)) in a comparison based sort. If you need to meet a requirement that at first blush appears to rely on a sorted array of elements but needs to be better than O(n log(n)) then consider a Bucket Sort.

[The example Bucket Sort question tries to find the largest gap between a sequence of integers while avoiding an O(n log(n)) sort. Cunningly, it creates a series of "buckets" that cover the whole range but with each bucket holding a range slightly less than the mean difference. Thus, no 2 integers that differ by the mean or more can fall into the same bucket. You need only then to see what is the max and min in all the buckets, which can be done in  O(n)]

You may be able to do better than an n log(n) sort if you know something about the data and don't need to compare the elements. For example, if you have X elements that you know are in the range of 1 to Y and you know there are no duplicates (so necessarily X < Y), you can instantiate an array of length Y and scan through the X elements putting them in their natural position in the Y-length array. At the end of this scan, scan through the Y-length array discarding the nulls. The result is the ordered elements. This was all done in time O(X + Y).