Agile Java Man: Conjugate Priors in Bayesian Probability

Friday, November 11, 2016

Conjugate Priors in Bayesian Probability

Apropos the recent election, I was reading Allen Downey's blog about Nate Silver and Bayesian forecasting. This piqued my interest in probability distributions (that I forgot about a quarter century ago). Here are some notes.

Binomial Distribution

This is a distribution that describes a sequence of n independent yes/no experiments each with a probability of p. It is a discrete since you obviously can't have, say, 3.25 successes.

The formula for the binomial probability mass function (PMF) is:

_nC_k p^k (1 - p)^n-k

where

n is the number of trials
k is the number of successes
p is the probability of success.

Beta Distribution

"The beta distribution is defined on the interval from 0 to 1 (including both), so it is a natural choice for describing proportions and probabilities.". It is (obviously) a continuous function.

The probability density function (PDF) for the beta distribution is:

x^{α - 1}(1 - x)^{β - 1}
________________

B(α, β)

where α and β are hyperparameters and B(α, β) is:

Γ(α) Γ(β)
_________

Γ(α + β)

where Γ(n) is the gamma function and it's simply defined as:

Γ(n) = (n - 1)!

OK, that sounds a lot more complicated than it is but any decent coder could write all this up in a few lines of their favourite language.

One last thing, the sum of the hyperparameters is the total number of trials. That is

α + β = n

Bayes: Binomial and Beta distribution

And this is where it gets interesting. Recall that the posterior distribution is proportional to the likelihood function and the prior. Or, in maths speak:

P(A | X) ∝ P(X | A) P (A)

"It turns out that if you do a Bayesian update with a binomial likelihood function ... the beta distribution is a conjugate prior. That means that if the prior distribution for x is a beta distribution, the posterior is also a beta distribution." [1]

How do we know this? Well take the equation for the binomial's PMF and multiply it by the beta's PDF. But, to make things easier, note that _nC_k and B(α, β) are constants and can be ignored in a "proportional to" relationship. So, it becomes:

P(a | x) ∝ x^k (1 - x)^n-kx^{α - 1}(1 - x)^{β - 1}

which simplifies to:

P(a | x) ∝ x^α+k-1 (1 - x)^n-k+^{β - 1}

Which is another beta distribution! Only this time

α' = α + k - 1

and

β' = n - k + β - 1

Proofs for other combinations can be found here, more information here, a better explanation by somebody smarter than myself is here, and possibly the most informative answer on StackExchange ever here.

[1] Think Bayes, Downey

Agile Java Man

Friday, November 11, 2016

Conjugate Priors in Bayesian Probability

No comments:

Post a Comment

Blog Archive

About Me