Apropos the recent election, I was reading Allen Downey's blog about Nate Silver and Bayesian forecasting. This piqued my interest in probability distributions (that I forgot about a quarter century ago). Here are some notes.

__Binomial Distribution__

This is a distribution that describes a sequence of

*n***independent**yes/no experiments each with a probability of*p*. It is a discrete since you obviously can't have, say, 3.25 successes.
The formula for the binomial probability mass function (PMF) is:

_{n}C

_{k}p

^{k}(1 - p)

^{n-k}

where

n is the number of trials

k is the number of successes

p is the probability of success.

__Beta Distribution__

"The beta distribution is defined on the interval from 0 to 1 (including both), so it is a natural choice for describing proportions and probabilities.". It is (obviously) a continuous function.

The probability density function (PDF) for the beta distribution is:

The probability density function (PDF) for the beta distribution is:

x

________________

B(α, β)

where α and β are hyperparameters and B(α, β) is:

Γ(α) Γ(β)

_________

Γ(α + β)

where Γ(n) is the gamma function and it's simply defined as:

Γ(n) = (n - 1)!

OK, that sounds a lot more complicated than it is but any decent coder could write all this up in a few lines of their favourite language.

One last thing, the sum of the hyperparameters is the total number of trials. That is

α + β = n

^{α - 1}(1 - x)^{β - 1}________________

B(α, β)

where α and β are hyperparameters and B(α, β) is:

Γ(α) Γ(β)

_________

Γ(α + β)

where Γ(n) is the gamma function and it's simply defined as:

Γ(n) = (n - 1)!

OK, that sounds a lot more complicated than it is but any decent coder could write all this up in a few lines of their favourite language.

One last thing, the sum of the hyperparameters is the total number of trials. That is

α + β = n

__Bayes: Binomial and Beta distribution__

And this is where it gets interesting. Recall that the posterior distribution is proportional to the likelihood function and the prior. Or, in maths speak:

P(A | X) ∝ P(X | A) P (A)

"It turns out that if you do a Bayesian update with a binomial likelihood function ... the beta distribution is a conjugate prior. That means that if the prior distribution for x is a beta distribution, the posterior is also a beta distribution." [1]

P(A | X) ∝ P(X | A) P (A)

"It turns out that if you do a Bayesian update with a binomial likelihood function ... the beta distribution is a conjugate prior. That means that if the prior distribution for x is a beta distribution, the posterior is also a beta distribution." [1]

How do we know this? Well take the equation for the binomial's PMF and multiply it by the beta's PDF. But, to make things easier, note that

_{n}C

_{k}and B(α, β) are constants and can be ignored in a "proportional to" relationship. So, it becomes:

P(a | x) ∝ x

^{k}(1 - x)

^{n-k }x

^{α - 1}(1 - x)

^{β - 1}

which simplifies to:

P(a | x) ∝ x

^{α+k-1}(1 - x)

^{n-k+}

^{β - 1}

Which is another beta distribution! Only this time

α' = α + k - 1

and

β' = n - k + β - 1

Proofs for other combinations can be found here, more information here, a better explanation by somebody smarter than myself is here, and possibly the most informative answer on StackExchange ever here.

[1] Think Bayes, Downey

## No comments:

## Post a Comment