0

I have a list of 1000 samples from a distribution. I can find the lower limit of the 95% CI as it is the 2.5% percentile. In this case, it is zero. I would like to also test the hypothesis that a sample from the distribution would be negative please. I tried counting how many of the samples were negative and dividing this by the number of samples but where, in this case, I would expect to get 2.5%, because the lower bound of the 95% CI is zero, I actually got 12%.

How can I test the hypothesis that a sample from the distribution would be positive please?

In Python:

import numpy as np
import pandas as pd

print('Generate the sample data') data = pd.DataFrame({'A':[1]14+[0]2+[0]3, 'B':[1]14+[1]2+[0]3})

print('sample size: ',len(data)) print('') print('A B X') print('1 1',len(data[((data.A==1)&(data.B==1))])) print('1 0',len(data[((data.A==1)&(data.B==0))])) print('0 1',len(data[((data.A==0)&(data.B==1))])) print('0 0',len(data[((data.A==0)&(data.B==0))])) print('')

Results

Lower = {} Media = {} Upper = {}

Control Parameters

Runs_Max = 1000 Runs = range(Runs_Max)

BS = len(data) print('bootstrap size: ',BS)

Results

I_R = []

for R in Runs:

# Bootstrap
BooP = data.sample(BS, replace=True)

# Data
X_11 = len(BooP[((BooP.A==1)&(BooP.B==1))])
X_10 = len(BooP[((BooP.A==1)&(BooP.B==0))])
X_01 = len(BooP[((BooP.A==0)&(BooP.B==1))])
X_00 = len(BooP[((BooP.A==0)&(BooP.B==0))])

# Improvement (I) = pB/pA-1
if X_11+X_10 == 0:
    I_x = 10101 # approx infinity!
else:
    I_x = (X_11+X_01)/(X_11+X_10)-1

# Results
I_R.append(I_x)

# CI
Lower[R] = np.percentile(I_R,  2.5)
Media[R] = np.percentile(I_R, 50  )
Upper[R] = np.percentile(I_R, 97.5)

Low = Lower[max(list(Lower.keys()))] Med = Media[max(list(Lower.keys()))] Hig = Upper[max(list(Lower.keys()))]

print('I = ',Med,Low,Hig)

print('Hypothesis test') I_N = [i for i in I_R if i <= 0] Hyp = len(I_N)/len(I_R) print('Hyp = ',Hyp) ```

R. Cox
  • 179
  • For what quantity are you calculating the confidence interval? It’s not clear from your Python code. – Dave Feb 26 '21 at 11:16
  • Thanks Dave, it's the quantity described in this question, pB/pA-1: https://stats.stackexchange.com/questions/506994/interval-estimation-for-a-change-in-a-binomial-proportion/507060#507060 – R. Cox Feb 26 '21 at 11:27
  • In the code, its the quantity called "I" for Increase – R. Cox Feb 26 '21 at 11:29

1 Answers1

1

The reason why the reverse hypothesis came out at 12% was that 12% of the estimates were zero. My hypothesis test should have been "<", not "<=". In Python:

print('Hypothesis test')
I_N = [i for i in I_R if i < 0]
Hyp = len(I_N)/len(I_R)
print('Hyp =',Hyp)

print('min =',min(I_R))

Which gives:

Hypothesis test
Hyp = 0.0
min = 0.0

enter image description here

Figure 1, example data

enter image description here

Figure 2, Reverse Hypothesis (RH) against number of Runs for example I

enter image description here

Figure 3, Reverse Hypothesis (RH) against number of Runs for example II

Bootstrapping is conditional on the original sample. The probability that pB<pA cannot really be zero because it is possible that the sample is not representative of the population. The proportion of bootstrap samples for which pB<pA is however zero and this strongly indicates that pB<pA.

R. Cox
  • 179