## Monday, February 12, 2018

### Great expectations

Since this took me a few hours to sort out in my head, I thought I would blog it.

The Bessel Correction is used taking a sample with mean x̄ and variance s2 from  a population whose true mean is μ and true variance is σ2.

The standard deviation, s, is:

√(Σin(xi-x̄)2/n)

but we'd be wrong if we thought we could use this as an estimate for σ2. (We actually divide by n-1).

Note that sometimes we don't need to estimate the population. In the case of a country conducting a census, we have all the information. However, for most use cases we need to sample the country's population.

[Aside: Interestingly, “such a basic concept as standard deviation, with an apparently impeccable mathematical pedigree, is socially constructed and a product of history” Leeds University].

The reasoning goes like this.

The expected value for  is E[] = μ because the expectation is the mean, by definition.

The variance of  is:

E[( - μ)2] - (E[ - μ])2

also by definition.

The second term is clearly zero when we expand it and substitute in the expected value for  we've just stated above.

We then expand the first term and take the expected value of all the resulting terms. However, we have to be careful here as E[A.B] = E[A].E[B] iff A and B are independent.

To illustrate, the expected value of rolling a pair of fair 6-sided dice is (3.5)2 which is 12.5. But the expected value of rolling just one die and squaring the result is (12 + 22 + ... 62)/6 which is about 15. Clearly they are not the same probability distribution as, for example, p(12) is non-zero in the first case but zero in the second (as you can't get a 12 by squaring integers).

So, let's represent the variance of  as:

E[ (Σin(xi-μ)/n) . (Σjn(xj-μ)/n) ]

for all terms where i≠j, the distributions are independent so this becomes:

E[(x-μ)2] = E[x2 - 2xμ + μ2] = E[x2] - 2E[x]μ + E[μ2] = E[x]E[x] - 2μ2μ= μ- 2μ2μ2 = 0

but when i=j and the distributions are not independent, we have

E[ Σin(xi-μ)2/n2 ] = E[ E[(xi-μ)2]/n ] = E[(xi-μ)2]/n = σ2/n

Now, we can re-express our definition of

s= Σin(xi-x̄)2/n

by noting that:

(xi-x̄)= ((xi-μ)-(x̄-μ))2
= (xi)- 2(xi)() + ()2

The expectation of the first term is σ2+E[x-μ]2=σ2 from our definition of the variance of the whole population, the last term is σ2/n as we've just shown and as for the middle term, we do the same trick as before. Where i=j, it's zero but for the remaining 1/n cases, it's σ2. So, this middle term must equal 2σ2/n.

Therefore:

E[s2]=(1-1/n)σ2

so we can estimate σ2 as

σ2≅ =(n/n-1)s2

So, don't forget your n-1 if you're taking a sample and not calculating values for the whole population.