Is it possible to get different result of bootstrap statistics using different software?

Question

Is it possible to get different result of bootstrap sample statistics using different software ?

Absolutely, you can get different results from the same software. It's all about the randomness (and implementation). — user2974951, Apr 07 '22 at 10:14
Will the difference will still exist for large number of bootstrap sample? like e.g more than 5000 samples? — Ray Hope, Apr 07 '22 at 10:15
Depends a lot on the data, how variable it is. 5000 is a good number, but you would really have to try to know. If you run it twice with 5000 and get very similar results then you are probably good. — user2974951, Apr 07 '22 at 10:19

score 2 · Answer 1 · answered Apr 07 '22 at 10:37

You can get different results from the same software! Run the boot package in R with set.seed(1) before your bootstrap code and then with set.seed(2). Your results should differ at least a little.

If you go run a bootstrap in Python (SAS, Stata, etc), you will be taking yet a third set of bootstrap samples, giving a third result.

BruceET · Answer 2 · 2022-04-07T18:36:37.993

Suppose I have a sample x of size $n = 900$ with the summary statistics below:

summary(x);  sd(x)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  31.68   45.30   50.08   49.96   54.42   70.49 
[1] 6.858865   # sample SD

Then the (estimated) standard error is $S/\sqrt{n} = 6.859/30 = 0.229.$ Assuming that data are normal, we would have the 95% t confidence interval for $\bar X \pm 1.968 S/\sqrt{n},$ where $1.968$ cuts probability $0.025$ from the upper tail of (symmetrical) Student's t distribution with 899 degrees of freedom.

qt(.975, 899)
[1] 1.962606

By computation or from the t.test procedure in R, we get the CI $(49.51, 50.41).$ The margin of error of this CI is about $1.963(0.229)=0.4495.$

t.test(x)$conf.int
[1] 49.51063 50.40805
attr(,"conf.level")
[1] 0.95

If the sample truly is from a normal distribution, then this is a valid 95% CI for the population mean $\mu$ and we are done.

However, if we have reason to doubt that the data are normal we might find a 95% nonparametric bootstrap CI for $\mu.$ There are many possible styles of bootstrap CIs. I will illustrate one.

The observed sample mean is $49.96.$

a.obs = mean(x);  a.obs
[1] 49.95934

We take many (3000) re-samples of size 900 with replacement from x in order to get an idea of the sampling error of the sample mean. For each re-sample we find of its mean from the observed mean of the original sample.

set.seed(2021)
d = replicate(3000, mean(sample(x,900,rep=T))-a.obs)

The deviations $d$ are mainly between $\pm 0.452.$ which is not much different from the margin of error of the 95% t confidence interval above. The 95% nonparametric bootstrap CI is $(59.51, 50.41),$ which id agreement with the 95% t CI above.

UL = quantile(d, c(.975,.025))
UL
     97.5%       2.5% 
 0.4520234 -0.4522723
a.obs - UL
   97.5%     2.5% 
49.50732 50.41161

If I run this same bootstrap procedure twice again (unknown seeds) I get bootstrap CIs $(49.49425,\, 50.42611)$ and $(49.49007,\, 50.41792),$ which are about the same as the first one above, for practical purposes.

Why not all bootstrap CIs from a sample are exactly the same:

Because bootstrapping depends on random re-sampling you can't expect exactly the same result every time. Experience has shown that 2000 or 3000 re-samples are enough to get nearly reproducible results.
If you try bootstrapping with very small samples, you might get a larger variety of bootstrap CIs.
Also, if you use a different style of nonparametric bootstrap (there are several possibilities) you may get get a somewhat different result.
Finally, if the data are not normal, you can't expect a bootstrap CI to give the same results as a t CI. In this case the bootstrap CI is usually preferred, because the t CI assumes normal data.

Notes: (1) The fictitious data for my bootstrap CIs above were normal, sampled in R as follows:

set.seed(407)
x = rnorm(900, 50, 7)

(2) It is important to understand that re-samples used in bootstrapping do not provide additional information. They are part of the analysis, not part of the experiment. In the example above, the re-samples took the place of (a) the formula for the estimated standard error of the sample mean and (b) looking in a printed t table to find the constant 1.963.

Is it possible to get different result of bootstrap statistics using different software?

2 Answers2