Since this took me a few hours to sort out in my head, I thought I would blog it.

The Bessel Correction is used taking a sample with mean x̄ and variance s

^{2}from a population whose true mean is μ and true variance is σ

^{2}.

The standard deviation, s, is:

√(Σ

_{i}

^{n}(x

_{i}-x̄)

^{2}/n)

but we'd be wrong if we thought we could use this as an estimate for σ

^{2}. (We actually divide by n-1).

Note that sometimes we don't need to estimate the population. In the case of a country conducting a census, we have all the information. However, for most use cases we need to sample the country's population.

[Aside: Interestingly, “such a basic concept as standard deviation, with an apparently impeccable mathematical pedigree, is socially constructed and a product of history” Leeds University].

The reasoning goes like this.

The expected value for x̄ is E[x̄] = μ because the expectation is the mean, by definition.

The variance of x̄ is:

E[(x̄ - μ)

^{2}] - (E[x̄ - μ])

^{2}

also by definition.

The second term is clearly zero when we expand it and substitute in the expected value for x̄ we've just stated above.

We then expand the first term and take the expected value of all the resulting terms. However, we have to be careful here as E[A.B] = E[A].E[B]

**iff**A and B are

**independent**.

To illustrate, the expected value of rolling a pair of fair 6-sided dice is (3.5)

^{2}which is 12.5. But the expected value of rolling just one die and squaring the result is (1

^{2}+ 2

^{2}+ ... 6

^{2})/6 which is about 15. Clearly they are not the same probability distribution as, for example, p(12) is non-zero in the first case but zero in the second (as you can't get a 12 by squaring integers).

So, let's represent the variance of x̄ as:

E[ (Σ

_{i}

^{n}(x

_{i}-μ)/n) . (Σ

_{j}

^{n}(x

_{j}-μ)/n) ]

for all terms where i≠j, the distributions are independent so this becomes:

E[(x-μ)

^{2}] = E[x

^{2}- 2xμ + μ

^{2}] = E[x

^{2}] - 2E[x]μ + E[μ

^{2}] = E[x]E[x] - 2μ

^{2}

^{2 }= μ

^{2 }- 2μ

^{2}

^{2}= 0

but when i=j and the distributions are not independent, we have

E[ Σ

_{i}

^{n}(x

_{i}-μ)

^{2}/n

^{2}] = E[ E[(x

_{i}-μ)

^{2}]/n ] = E[(x

_{i}-μ)

^{2}]/n = σ

^{2}/n

Now, we can re-express our definition of

s

^{2 }= Σ

_{i}

^{n}(x

_{i}-x̄)

^{2}/n

by noting that:

(x

_{i}-x̄)

^{2 }= ((x

_{i}-μ)-(x̄-μ))

^{2}

= (x

_{i}-μ)

^{2 }- 2(x

_{i}-μ)(x̄-μ) + (x̄-μ)

^{2}

The expectation of the first term is σ

^{2}+E[x-μ]

^{2}=σ

^{2}from our definition of the variance of the whole population, the last term is σ

^{2}/n as we've just shown and as for the middle term, we do the same trick as before. Where i=j, it's zero but for the remaining 1/n cases, it's σ

^{2}. So, this middle term must equal 2σ

^{2}/n.

Therefore:

E[s

^{2}]=(1-1/n)σ

^{2}

so we can estimate σ

^{2}as

σ

^{2}≅ =(n/n-1)s

^{2}

So, don't forget your n-1 if you're taking a sample and not calculating values for the whole population.

## No comments:

## Post a Comment