bootstrap best practice in AB testing

Question

I want to compare a percentile metric for control vs treatment. Using bootstrap, I see two ways of doing so and wonder which one makes more sense.

Approach 1: Every time:

Bootstrap the control group and get the percentile,
Bootstrap the treatment group and get the percentile
Compute the difference. Repeat B times to get the differences and get the CI.

Approach 2:

Bootstrap the control B times, get the average and variance of the metric
Bootstrap the treatment B times, get the average and variance of the metric
Compute the difference and its variance, assuming they are independent.

Below is a simulation code in R. The 2nd approach has narrower CI, but requires normal assumption.

A <- rnorm(1000, 0.1, 1)
B <- rnorm(1000, 0.15, 1)
pctA = quantile(A, 0.99)
pctB = quantile(B, 0.99)
Approach 1:
1. Bootstrap each set
2. Compute the differences for each bootstrap sample
3. Get confidence intervals
N = 500
diffBoot = rep(NA, N)
for (i in 1:N) {
    tempA = sample(A, size=1000, replace=TRUE)
    tempB = sample(B, size=1000, replace=TRUE) 
    diff = quantile(tempA, 0.99) - quantile(tempB, 0.99)
    diffBoot[i] = diff
}
mean(diffBoot)
quantile(diffBoot, c(0.025, 0.9725))
Approach 2:
1. Bootstrap each set
2. Compute metrics and variances
3. Get differences and confidence intervals
N = 500
pctAboot = rep(NA, N)
pctBboot = rep(NA, N)
for (i in 1:N) {
    tempA = sample(A, size=1000, replace=TRUE)
    tempB = sample(B, size=1000, replace=TRUE) 
    pctAtemp = quantile(tempA, 0.99) 
    pctBtemp = quantile(tempB, 0.99)
    pctAboot[i] = pctAtemp
    pctBboot[i] = pctBtemp
}
meanboot <- mean(pctAboot - pctBboot)
sdboot <- sqrt(var(pctAboot) + var(pctBboot))
meanboot / (1.96 * sdboot)
```

Your code seems to have some typo error, tempB = sample(B, size=1000, replace=TRUE): here B is 500, not B <- rnorm(1000, 0.15, 1) — wei, Apr 23 '23 at 03:26

score 0 · Accepted Answer · answered Apr 23 '23 at 07:09

If we calculate the statistics of approach 1 and approach 2 condition on the same sample (this can reduce the fluctuation of the simulation itself), the two approaches will give the same diff estimator and very close variance estimation, while the difference in variance is due to the random correlation between pctAtemp and pctBtemp(pctAtemp and pctBtemp is independent).

set.seed(1)
sample_cnt = 1000
tempA = rnorm(sample_cnt, 0.10, 1)
tempB = rnorm(sample_cnt, 0.15, 1)
N <- 500 ## this give

p <- 0.99 ## test percentile
diffBoot <- rep(NA, N)
pctAboot <- rep(NA, N)
pctBboot <- rep(NA, N)
for (b in 1:N) {
  tempA = sample(A, size = sample_cnt, replace = TRUE)
  tempB = sample(B, size = sample_cnt, replace = TRUE)
tempA = rnorm(sample_cnt, 0.10, 1)
tempB = rnorm(sample_cnt, 0.15, 1)
pctAtemp = quantile(tempA, p)
  pctBtemp = quantile(tempB, p)
For each sample, two approach statistics are calculated simultaneously, which can reduce the variance of the simulation
diffBoot[b] = pctAtemp - pctBtemp
  pctAboot[b] = pctAtemp
  pctBboot[b] = pctBtemp
}
est1 <- mean(diffBoot)
var1 <- var(diffBoot)
sprintf("approach 1 est: %.6f, var: %.6f", est1, var1)
est2 <- mean(pctAboot - pctBboot)
var2 <- var(pctAboot) + var(pctBboot)
sprintf("approach 2 est: %.6f, var: %.6f", est2, var2)
sprintf("var diff(%.6f) is var(pctAtemp, pctBtemp) * 2 = %.6f", var2 - var1, var(pctAboot, pctBboot) * 2)
[1] "approach 1 est: -0.009805, var: 0.025579"
[1] "approach 2 est: -0.009805, var: 0.024857"
[1] "var diff(-0.000722) is var(pctAtemp, pctBtemp) * 2 = -0.000722"

Another difference is: approach 1 is close to percentile bootstrap CI, approach 2 is closer to Standard interval bootstrap.Is it true that the percentile bootstrap should never be used?

Thanks for the answer. And the extra link is also very informative. Appreciate that! — user28363, Apr 23 '23 at 15:50

bootstrap best practice in AB testing

Approach 1:

1. Bootstrap each set

2. Compute the differences for each bootstrap sample

3. Get confidence intervals

Approach 2:

1. Bootstrap each set

2. Compute metrics and variances

3. Get differences and confidence intervals

1 Answers1

tempA = rnorm(sample_cnt, 0.10, 1)

tempB = rnorm(sample_cnt, 0.15, 1)

For each sample, two approach statistics are calculated simultaneously, which can reduce the variance of the simulation