Hypothesis Testing with a Bootstrap

Question

I have a list of 1000 samples from a distribution. I can find the lower limit of the 95% CI as it is the 2.5% percentile. In this case, it is zero. I would like to also test the hypothesis that a sample from the distribution would be negative please. I tried counting how many of the samples were negative and dividing this by the number of samples but where, in this case, I would expect to get 2.5%, because the lower bound of the 95% CI is zero, I actually got 12%.

How can I test the hypothesis that a sample from the distribution would be positive please?

In Python:

import numpy as np
import pandas as pd
print('Generate the sample data')
data = pd.DataFrame({'A':[1]14+[0]2+[0]3,
                     'B':[1]14+[1]2+[0]3})
print('sample size: ',len(data))
print('')
print('A B X')
print('1 1',len(data[((data.A==1)&(data.B==1))]))
print('1 0',len(data[((data.A==1)&(data.B==0))]))
print('0 1',len(data[((data.A==0)&(data.B==1))]))
print('0 0',len(data[((data.A==0)&(data.B==0))]))
print('')
Results
Lower = {}
Media = {}
Upper = {}
Control Parameters
Runs_Max = 1000
Runs = range(Runs_Max)
BS = len(data)
print('bootstrap size: ',BS)
Results
I_R = []
for R in Runs:
# Bootstrap
BooP = data.sample(BS, replace=True)

# Data
X_11 = len(BooP[((BooP.A==1)&amp;(BooP.B==1))])
X_10 = len(BooP[((BooP.A==1)&amp;(BooP.B==0))])
X_01 = len(BooP[((BooP.A==0)&amp;(BooP.B==1))])
X_00 = len(BooP[((BooP.A==0)&amp;(BooP.B==0))])

# Improvement (I) = pB/pA-1
if X_11+X_10 == 0:
    I_x = 10101 # approx infinity!
else:
    I_x = (X_11+X_01)/(X_11+X_10)-1

# Results
I_R.append(I_x)

# CI
Lower[R] = np.percentile(I_R,  2.5)
Media[R] = np.percentile(I_R, 50  )
Upper[R] = np.percentile(I_R, 97.5)


Low = Lower[max(list(Lower.keys()))]
Med = Media[max(list(Lower.keys()))]
Hig = Upper[max(list(Lower.keys()))]
print('I = ',Med,Low,Hig)
print('Hypothesis test')
I_N = [i for i in I_R if i <= 0]
Hyp = len(I_N)/len(I_R)
print('Hyp = ',Hyp)
```

For what quantity are you calculating the confidence interval? It’s not clear from your Python code. — Dave, Feb 26 '21 at 11:16
Thanks Dave, it's the quantity described in this question, pB/pA-1: https://stats.stackexchange.com/questions/506994/interval-estimation-for-a-change-in-a-binomial-proportion/507060#507060 — R. Cox, Feb 26 '21 at 11:27

R. Cox · Answer 1 · 2021-02-27T13:19:23.857

The reason why the reverse hypothesis came out at 12% was that 12% of the estimates were zero. My hypothesis test should have been "<", not "<=". In Python:

print('Hypothesis test')
I_N = [i for i in I_R if i < 0]
Hyp = len(I_N)/len(I_R)
print('Hyp =',Hyp)
print('min =',min(I_R))

Which gives:

Hypothesis test
Hyp = 0.0
min = 0.0

Figure 1, example data

Figure 2, Reverse Hypothesis (RH) against number of Runs for example I

Figure 3, Reverse Hypothesis (RH) against number of Runs for example II

Bootstrapping is conditional on the original sample. The probability that pB<pA cannot really be zero because it is possible that the sample is not representative of the population. The proportion of bootstrap samples for which pB<pA is however zero and this strongly indicates that pB<pA.

Is this really a hypothesis test and can bootstrapping be used for hypothesis testing? — R. Cox, Feb 27 '21 at 13:20

Hypothesis Testing with a Bootstrap

Results

Control Parameters

Results

1 Answers1