Trying to get my head around how Donald Knuth generates samples from a Poisson distribution. The formula is simple:
Let L=eλ, p=1. Count the number of times you multiply p by a uniform distribution in the range [0,1] until p
Simple. But why does it work?
Poisson
The Poisson distribution relies on the Poisson process. That is, the probability that something is about to occur has nothing to do with what has gone on in the past. This is seen throughout nature and makes it very useful in the sciences. For example, the probability of an atom radioactively decaying has nothing to do with how long it has been sitting around for before the decay.
"A t each arrival time and at each fixed time, the process must probabilistically restart, independent of the past. The first part of that assumption implies that X is a sequence of independent, identically distributed variables [IIDs] . The second part of the assumption implies that if the first arrival has not occurred by time s, then the time remaining until the arrival occurs must have the same distribution as the first arrival time itself" (from here).
P(X > t + s | X > s) = P(X > t)
Let F(t) = P(X > t) for t > 0. Then:
F(t + s) = F(t) F(s)
since the probabilities of independent events can be multiplied to give the probability of both happening. (The mathematically inclined may look at this equation and suspect that logarithms are about to make an appearance...)
Now, this is the clever bit. Let a = F(1). Then:
F(n) = F(Σn1) = ΠnF(1) = an
The proof for this being true for all rational n is here but, briefly, if we let λ = - ln (a) then:
F(t) = e-λt
which, if you recall, is the probability that an event will happen at t or later. That is, it's a value between 0 and 1.
Now, we use a method called inversion sampling. The basic idea is that given a plot, y = P(x) then taking uniformly random samples on the y-axis then the corresponding values on the x-axis will give you a distribution that conforms to P(x).
Note that y represents a probability so necessarily 0 ≤ y ≤ 1. Since a uniform random number generator comes with pretty much any programming language, the problem is getting simpler.
The last piece of the jigsaw is documented here and explains why Knuth's algorithm is so simple. Basically, we rearrange the equation for F(t) by taking natural logs:
t = - ln F(t) / λ
and since we're using inversion sampling, we know that the probability F(t) is going to be simulated by a uniformly distributed random variable between 0 and 1. We call it U.
Now, given any random sample Ui there will be a value ti. The question is Knuth asks is: how many random samples Ui do I need before t is a unit of time (ie, 1)?
More mathematically, what is k for:
Σki=1 ti ≤ 1 ≤ Σk+1i=1 ti
- Σki=1 ln Ui / λ ≤ 1 ≤ - Σk+1i=1 ln Ui / λ
Σki=1 ln Ui ≤ -λ ≤ Σk+1i=1 ln Ui
ln (Πki=1 Ui) ≤ -λ ≤ ln (Πk+1i=1 Ui)
Πki=1 Ui ≤ e-λ ≤ Πk+1i=1 Ui
which is Knuth's algorithm. Notice that the beauty of this is that it's computationally cheap for relatively small k. This should be fine for most applications.