I was using respondent-driven sampling analysis tool (RDSAT) to get bootstrapping confidence intervals. But each time I re-did the analysis, I noticed the bootstrapping standard errors and confidence intervals changed a little bit. Is this normal for bootstrapping confidence intervals? Thank you!
2 Answers
Bootstrapping involves resampling your data randomly. Thus, each time you bootstrap, a different (re)sample will be drawn. Therefore, the results of different bootstrap runs will be different.
If these differences are large, then you should be suspicious that your bootstrap may not be working well. If the differences are trivial, they are no problem.
You may want to set the seed value of your random number generator in order to make your bootstrap exactly replicable.
- 123,354
-
i would also add that a one way to get more similar results from bootstrap is to get more bootstrap samples. Maybe the default is 100, but you can run 1000 or 5000 instead – rep_ho Sep 21 '20 at 20:27
-
Thank you! This is very helpful! how large the difference would you suggest I should be suspicious. Currently, the differences lie in the third decimal place. And the number of re-samples for bootstrap I set is 16,000 in the software. – Sophiex Sep 21 '20 at 20:44
-
As to how large the difference "needs" to be to be problematic: that will depend on the conclusions you will draw from your results. If the conclusions would change with different bootstrap runs, then there likely is a problem. (There rarely is.) Yes, this requires thinking about the problem context. – Stephan Kolassa Sep 22 '20 at 05:25
This is totally normal and why we set a random seed (to get the same randomization each time) via set seed in R or np.random.seed in Python.
The way bootstrap works is to take many random samples, with replacement, of your data, so there should be small fluctuations in your calculated values as those random samples vary.
- 62,186