Wednesday, January 25, 2017

Poisoned Poisson

I needed to code up a simple implementation of the probability mass function (PMF) for the boring old Poisson distribution. Five minutes work, or so I thought.

My first cut of the code looked like this:

  def poissonWithSmallMean(x: Int, mu: Double): Double = Math.pow(mu, x) * Math.exp(-mu) / factorial(x)

where factorial is a very simple implementation of factorial using tail recursion. I could have implemented in a more efficient manner using caching etc but I wasn't expecting too much trouble.

Going to the excellently named StatTrek to get some test values, my test looked like:

    poissonWithSmallMean(10, 12) shouldEqual 0.104837255883659 +- tolerance

and it passed. Great. 

Things started getting strange with slightly larger numbers. StatTrek was telling me that with x=998 and mu=999, we should get the modestly-sized answer of 0.0126209223360251. But my code was blowing up.

It's not hard to see why. That factorial goes crazy pretty quickly. Given that the expected value is not particularly large and not terribly small, how do we compute it simply? 

The answer is here. It depends on logs. If:

n! = n * (n-1) * (n-2) * ... * 1


log n! = log n + log(n-1) + log(n-2) + ... + log(1)

Now, our equation to calculate the PMF looks like this:

  def poissonWithLargeMean(x: Integer, mu: Double): Double = {
    val logged = (x * Math.log(mu)) - mu - logFactorial(x)

where logFactorial is just a simple implementation of the equation for log n! above. This handles x=998, mu=999 beautifully.

We could start implementing logFactorial in a much more efficient manner by using the Stirling approximation (log n!  n log n). This is a handy equation which incidentally also gives us the limit on comparison sorts. Take a look at John D Cooke's blog in the link above to see how to do this. But this simple solution is fine for me for now.

No comments:

Post a Comment