Saturday, January 2, 2021

Bootstrapping

When trying to evaluate which method most increases the accuracy of a neural net, I had previously used a Bayesian method. But, with a technique known as bootstrapping, I could calculate the confidence interval using classical methods.

"The bootstrap is a method for estimating standard errors and computing confidence intervals" [1]. It's very useful when you have small amounts of data. Using it, we resample (with replacement) the original data many, many times until a histogram of our results is more informative.

"It must be noted that increasing the number of resamples, m, will not increase the amount of information in the data. That is, resampling the original set 100,000 times is not more useful than only resampling it 1,000 times. The amount of information within the set is dependent on the sample size, n, which will remain constant throughout each resample. The benefit of more resamples, then, is to derive a better estimate of the sampling distribution." [TowardsDataScience]

Recall that I only had 9 data points for my two methods (L1 and L2 normalisation of the data). But using the bootstrap technique, this lumpy data could be transformed like this for L1 normalization:

Bootstrapped
accuracy using L1 normalization

and this for L2  normalization:

Bootstrapped 
accuracy using L2 normalization

Note how a sparse distribution becomes much more informative.

By excluding the lowest 5% and the highest 5% of results from bootstrapping we can calculate the confidence interval such:

  • the 90% confidence interval for L1 normalization was [0.9466, 0.9504].
  • the 90% confidence interval for L2 normalization was [0.9412, 0.9456].

[Python code lives in my GitHub]

Note some caveats with confidence intervals:

"... a 95% confidence interval does not indicate that the parameter of interest has a 95% probability of being within the interval.  Ironically, the situation is worse when the sample size is large.  In that case, the CI is usually small, other sources of error dominate, and the CI is less likely to contain the actual value." Statistical Inference is only mostly wrong - Downey

Also note that boostrapping depends on the Central Limit Theorem - the idea that sampling the mean of most distributions eventually forms a Gaussian. This is not universally true. For instance, the Cauchy distribution is of the form (1+x2)-1 which you might recognise as a standard integral that can be integrated from -∞ to ∞ (giving a function dependent on arctan) so it can be a valid probability distribution. However, if we tried to find the mean by integrating x/(1+x2) (try using integration by substitution and note that it is symmetric around the y-axis) we get an infinity. Hmm.

Indeed, if we change the Python code to generate data from a Cauchy so:

from scipy.stats import cauchy
...
    data = cauchy.rvs(size=n) 

then don't be surprised if you see something like this:

Bootstrapping a Cauchy

Which is not very useful at all.

[1] "All of Statistics" Wasserman.

No comments:

Post a Comment