Bootstrapped confidence intervals: different values at every computation?

Question

I am running an ANOVA and computing bootstrapped CIs for my effect sizes. I am using for this the Measures of Effect Size Toolbox for Matlab.

To my surprise, if I re-run the same line of code, i.e. compute the same thing for the same data, the lower&upper margins of the CI are slightly different each time.

I know that bootstrapped CIs are built from permutation analyses based on random generations of distributions. Still, I do not remeber reading anywhere this odd property of CIs of being never twice the same!

You can all but ensure that your bootstrap CIs agree exactly to an arbitrary precision (e.g., 1 significant figure, two significant figures, etc.) by selecting a larger number of bootstrap samples to estimate from. — Alexis, Nov 17 '19 at 16:28
Thanks! I guess the answers that were put in attest to this also. Thanks again. — z8080, Nov 17 '19 at 16:54

knrumsey · Accepted Answer · 2019-11-17T22:30:28.290

A bootstrap resample consists of data points $(x_1^*, x_2^*, \cdots x_n^*)$ which are sampled with replacement from the original data $(x_1, x_2, \cdots x_n)$. Technically, a Bootstrap procedure should consider all of the possible bootstrap samples. If this can be accomplished, then the Bootstrap confidence interval will be the same for every run.

Unfortunately, for a data set of size $n$, there are $n^n$ possible bootstrap samples (e.g. 10 billion when $n=10$) which is prohibitively large. To account for this, we usually randomly choose $M$ of the possible $n^n$ bootstrap samples. As $M$ gets large, the confidence interval generated here will converge to the "true" bootstrap CI, as if we had used all $n^n$ possible resamples.

If you want to see more consistent results across different runs, you can set the seed (as @Dave suggests) or try increasing the number of resamples. The latter approach will lead to a more expensive procedure, but will be less sensitive the random nature of the bootstrap in practice.

score 2 · Answer 2 · answered Nov 16 '19 at 19:58

2

Each time you repeat the process, you are taking a different set of samples. Those different samples will give slightly different results.

If you set a random seed, you will get the same results every time (which is why random seeds are useful). I don’t know the Matlab command for it, but R is set.seed(2019) and Python is random.seed(2019) after you import the random library.

answered Nov 16 '19 at 19:58

Dave

62,186

Thank you Dave! – z8080 Nov 17 '19 at 16:17

Bootstrapped confidence intervals: different values at every computation?

2 Answers2

Linked

Related