2

I'm not sure how to estimate the confidence interval (CI) for a change in a small sample size binomial proportion using the same sample set both times.

I have two methods that I would like to compare (A and B). I have tested both methods on the same sample ($n=28$) from a large population.

Method A gave the correct result 11 times but method B gave the correct result 17 times. I think that this indicates that method B is 17/11-1 = 55% better than method A. As well as this point estimate for the difference between the methods, I would like to understand the uncertainty caused by my small number of samples. How can I construct a 95% CI for the 55% improvement in performance please?

  • In 11 cases, both test A and test B worked.

  • In no cases, test A did work but test B didn't work.

  • In 6 cases, test A didn't work but test B did work.

  • In 11 cases, both test A and test B didn't work.

These proportions all relate to the same sample ($n=28$). They are not independent of each other.

Is there a way to calculate CIs that doesn't assume independence please? I would be happy with confidence intervals or with credible intervals and would also be interested in arguments as to why such measures of uncertainty were not appropriate.

MarianD
  • 1,535
  • 2
  • 11
  • 18
R. Cox
  • 179
  • 2
    If you really used the same patients both times, then summary results you give are not helpful. // You'd need both results for each patient. // For 6 patients 3 & 5 correct could mean 000111 vs 011111 (2 improvements, so maybe B better) or 000111 vs 111110 (3 improvements and one worse, only 2 agree, contradictory in a way such that maybe both A & B useless). A good start would be to provide relevant data. – BruceET Jan 30 '21 at 20:58
  • Very good question @BruceET, I have added those details. Thanks for your help! – R. Cox Jan 31 '21 at 10:53
  • This is different to the case where the two proportions can be assumed to be independent, here: https://stats.stackexchange.com/questions/58667/change-in-binomial-proportion-confidence-interval?rq=1 – R. Cox Feb 01 '21 at 10:30
  • Why has this been locked please? – R. Cox Feb 02 '21 at 13:34

5 Answers5

1

Comment: Thanks for additional info. Maybe start with $2\times 2$ table:

              A
          ---------
B         OK    Not       Tot
-----------------------------
OK        11      6        17      
Not        0     11        11
-----------------------------
T0t       11     17        28

Then a chi-squared test on this table has a very small P-value, supporting the anticipated lack of independence in the paired design. [A simulated P-value is necessary because of the small counts.]

chisq.test(rbind(c(11,6),c(0,11)), sim=T, B=5000)
Pearson's Chi-squared test 
with simulated p-value 
(based on 5000 replicates)

data: rbind(c(11, 6), c(0, 11)) X-squared = 11.723, df = NA, p-value = 0.0009998

In such cases, where the chi-squared test requires simulation, the simulated P-value is often about the same as for Fisher's exact test.

fisher.test(rbind(c(11,6),c(0,11)))
     Fisher's Exact Test for Count Data

data: rbind(c(11, 6), c(0, 11)) p-value = 0.0009334 alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: 2.983321 Inf sample estimates: odds ratio Inf

It seems clear that B does better than A, but with only 28 patients and six differences in responses, I do not want to speculate how to find a useful confidence interval for a difference in binomial proportions $p_B - p_A.$ Maybe someone else can answer that.

BruceET
  • 56,185
  • I'm asking about the difference in performance, pB/pA-1 – R. Cox Feb 01 '21 at 13:47
  • Why wouldn't you want to estimate a CI? The Wilson interval has good coverage for understanding the uncertainty of small n binomial proportions. https://stats.stackexchange.com/questions/469701/interval-estimation-for-a-binomial-proportion-given-a-specific-test-outcome – R. Cox Feb 01 '21 at 13:49
0

I can get a 95% CI for pA and pB:

pA: 0.398 [0.236, 0.697]
pB: 0.607 [0.338, 0.971]

I used MultinomCI which gives confidence intervals for multinomial proportions https://www.rdocumentation.org/packages/DescTools/versions/0.99.39/topics/MultinomCI

I chose the Wilson interval because I think that it gives good coverage for binomial proportions so I guessed that it would be ok for this as well.

In Python:

import numpy as np
import rpy2.robjects as ro
from rpy2.robjects.packages import importr

package_name = "DescTools"

try: pkg = importr(package_name) except: ro.r(f'install.packages("{package_name}")') pkg = importr(package_name) pkg

r_string = """CI = MultinomCI(c(11,0,6,11), conf.level=0.95, method="wilson") """ ro.r(r_string) R_C = ro.r['CI'] A_C = np.array(R_C)

print(' ') print('Estimate, CI') print(A_C) print(' ')

Nomeclature: the estimate of A=1, B=1 with a 95% CI is E_11(L_11,H_11)

E_11 = A_C[0][0] E_10 = A_C[1][0] E_01 = A_C[2][0] E_00 = A_C[3][0]

L_11 = A_C[0][1] L_10 = A_C[1][1] L_01 = A_C[2][1] L_00 = A_C[3][1]

H_11 = A_C[0][2] H_10 = A_C[1][2] H_01 = A_C[2][2] H_00 = A_C[3][2]

print('sum prob = 1: ', (E_11+E_10+E_01+E_00)) print(' ')

p_A = E_11 + E_10 p_B = E_11 + E_01

CI_A = [(L_11 + L_10), (H_11 + H_10)] CI_B = [(L_11 + L_01), (H_11 + H_01)]

print('pA: ', p_A) print('CI_A: ', CI_A) print(' ') print('pB: ', p_B) print('CI_B: ', CI_B) print(' ') print('Improvement (I) = pB/pA-1: ', p_B/p_A-1)

Now to try and get a 95% CI for:

Improvement (I) = pB/pA-1

I think that I first need to fit skew normal distributions to pA and pB:

import numpy as np
from scipy.stats import skewnorm
from scipy.optimize import differential_evolution
import matplotlib.pyplot as plt

pA = 0.398 pB = 0.607

CI_A = [0.236,0.697] CI_B = [0.338,0.971]

print('fit a distribution...') def fun(z): Skew, Scale, loc = z

lst = skewnorm.rvs(a=Skew,
             loc=loc,
             scale=Scale,
             size=1000)

lst=lst.tolist()

#mean = sum(lst)/len(lst)
median = np.percentile(lst, 50)
#mode = max(set(lst), key=lst.count)

CI = [np.percentile(lst, 2.5),
      np.percentile(lst, 97.5)]

# Error
Me_E = median - pB
Lo_E = CI[0] - CI_B[0]
Hi_E = CI[1] - CI_B[1]
error = Me_E**2 + Lo_E**2 + Hi_E**2
return error

Bounds of Skew, Scale & loc

bounds = ((-1,1),(0.1,1),(0,1))

maxiter = 5000

Optimiser

DE = differential_evolution(fun,bounds,maxiter=maxiter) print('success = ',DE.success) print(DE.message)

Skew = DE.x[0] Scale = DE.x[1] loc = DE.x[2] print('skewness parameter = ',Skew) print('scale = ',Scale) print('loc = ',loc)

print('Sampling from the skew normal distribution') lst = skewnorm.rvs(a=Skew, loc=loc, scale=Scale, size=1000)

lst=lst.tolist() mode = max(set(lst), key=lst.count) print('mode = ',mode)

mean = sum(lst)/len(lst) print('mean = ',mean)

median = np.percentile(lst, 50) print('median = ',median)

CI = [np.percentile(lst, 2.5), np.percentile(lst, 97.5)] print('CI = ',CI)

fig, ax = plt.subplots(1, 1) ax.hist(lst) plt.show()

That gives distribution parameters that give almost the same pA & pB:

pA: 0.423 [0.205, 0.642]
pB: 0.607 [0.336, 0.928]

enter image description here

Figure 1, pA and pB

Next, I combine the two distributions to get the CI for Improvement (I) = pB/pA-1

from scipy.stats import skewnorm
import matplotlib.pyplot as plt
import numpy as np

fig, ax = plt.subplots(1, 1)

A_ske = 0.9139952525939643 A_sca = 0.13849932254197161 A_loc = 0.3500567328083086 B_ske = 0.9960244237545532 B_sca = 0.18411099887378107 B_loc = 0.5126020868512271

Step = 0.001 ExeS = np.linspace(0+Step,1-Step,1/Step-1)

print('Inverse CDF transform') A_val = skewnorm.ppf(q=ExeS, a=A_ske, loc=A_loc, scale=A_sca) B_val = skewnorm.ppf(q=ExeS, a=B_ske, loc=B_loc, scale=B_sca)

Improvement

Improv = np.divide(B_val,A_val) Improv = Improv -1

CI = [np.percentile(Improv, 2.5), np.percentile(Improv, 97.5)] print('CI = ',CI)

A_PDF = {} B_PDF = {} for x in ExeS: A_PDF[x] = skewnorm.pdf(x=x, a=A_ske, loc=A_loc, scale=A_sca) B_PDF[x] = skewnorm.pdf(x=x, a=B_ske, loc=B_loc, scale=B_sca)

Dict to Lists

labels, A_data = [zip(A_PDF.items())] labels, B_data = [zip(B_PDF.items())]

Plot

ax.plot(ExeS, A_data, label='A', linewidth=5, color='k', linestyle='-') ax.plot(ExeS, B_data, label='B', linewidth=5, color='k', linestyle='--')

for item in ([ax.title, ax.xaxis.label, ax.yaxis.label] + ax.get_xticklabels() + ax.get_yticklabels()): item.set_fontsize(22)

legend = ax.legend(loc=0, ncol=1, bbox_to_anchor=(0.6, -.3, 1, 1), fancybox=True, shadow=False, framealpha=1, fontsize=22) plt.setp(legend.get_title(),fontsize=22)

plt.xlabel('$x$') plt.ylabel('$P$') plt.grid(b=True, which='major', color='b') plt.xlim([0,1]) plt.ylim([0,4])

fig = plt.gcf() fig.set_size_inches(4,4) plt.show() plt.clf()

This method assumes that the sample is just as representative of the population for method A as it is for method B. This is the other extreme from assuming independence.

This gave a different improvement CI each time I ran it:

[40%,62%]
[41%,60%]
[41%,49%]
[43%,68%]
[39%,48%]
R. Cox
  • 179
  • I think the inconsistency might be caused where I fit a distribution to pA and pB? – R. Cox Feb 07 '21 at 17:36
  • I've tried to code a less round the houses approach to the non-bootstrap technique https://stackoverflow.com/questions/66155480/inverse-cdf-transform-of-the-wilson-interval – R. Cox Feb 11 '21 at 13:34
0

Bootstrap

I tried a bootstrap. It often threw the error:

division by zero

I got around that by assuming:

infinity = approximately 10101

with the resulting estimates:

enter image description here

Figure 1, Improvement (I) against Bootstrap Size (BS) for runs = 100

enter image description here

Figure 2, Improvement (I) against against Runs for BS=28

This gave an improvement after 1000 runs with BS=28 of:

55% [14%, 150%]

In Python:

print('Generate the sample data')
data = pd.DataFrame({'A':[1]*11+[0]*6+[0]*11,
                     'B':[1]*11+[1]*6+[0]*11})
print('sample size: ',len(data))
print('')
print('A B X')
print('1 1',len(data[((data.A==1)&(data.B==1))]))
print('1 0',len(data[((data.A==1)&(data.B==0))]))
print('0 1',len(data[((data.A==0)&(data.B==1))]))
print('0 0',len(data[((data.A==0)&(data.B==0))]))
print('')

Results

I = {} Lower = {} Media = {} Upper = {}

Control Parameters

Runs = range(100) #bootstrap_size = range(len(data)) BS_Max = 100 bootstrap_size = range(BS_Max)

for BS in bootstrap_size: #print('bootstrap size ', BS)

# Results
I_T = {}

for R in Runs:

    # Bootstrap
    BooP = data.sample(BS, replace=True)

    # Data
    X_11 = len(BooP[((BooP.A==1)&(BooP.B==1))])
    X_10 = len(BooP[((BooP.A==1)&(BooP.B==0))])
    X_01 = len(BooP[((BooP.A==0)&(BooP.B==1))])
    X_00 = len(BooP[((BooP.A==0)&(BooP.B==0))])

    # Improvement (I) = pB/pA-1
    if X_11+X_10 == 0:
        I_x = 10101 # approx infinity!
    else:
        I_x = (X_11+X_01)/(X_11+X_10)-1

    # Results
    I_T[R] = I_x

# Results
I[BS] = I_T

# CI
Lower[BS] = np.percentile(list(I[BS].values()),  2.5)
Media[BS] = np.percentile(list(I[BS].values()), 50  )
Upper[BS] = np.percentile(list(I[BS].values()), 97.5)

Coverage

I think the coverage is around 93%, slightly under the target 95%.

enter image description here

Figure 3, Coverage Probability (CP) against P(10)

I assessed it with a simulation.

Coverage changes with probability, so it would be good to try it with various values of the probabilities of the four possible outcomes (P11, P10, P01, P00). Unfortunately, running many values of each P would take too long. Instead, I used MultinomCI to get the Wilson interval for each P. This gave:

      Est   Low     High
P(11) 0.393 0.236   0.576
P(10) 0.000 0.000   0.121
P(01) 0.214 0.102   0.395
P(00) 0.393 0.236   0.576

I gave the name Prob to the vector [P11, P10, P01, P00]

sum (Prob) = 1

I took the range of likely values for P10 and assumed that as P10 increases, P01 decreases equally.

I assessed 4 values of Prob:

Prob = [0.3, 0.0,  0.3,  0.4]
Prob = [0.3, 0.04, 0.26, 0.4]
Prob = [0.3, 0.08, 0.22, 0.4]
Prob = [0.3, 0.12, 0.18, 0.4]

I ran it with:

n       =  28
runs    =  1000
reps    =  10000

which took 41.5 hours.

Last, I estimated the 95% CI for the coverage of the 95% CI, again using the Wilson interval.

In Python:

import numpy  as np
import pandas as pd
import time
import rpy2.robjects as ro
import statsmodels.api
import matplotlib.pyplot as plt

start = time.time()

from rpy2.robjects.packages import importr

package_name = "DescTools"

try: pkg = importr(package_name) except: ro.r(f'install.packages("{package_name}")') pkg = importr(package_name) pkg

r_string = """CI = MultinomCI(c(11,0,6,11), conf.level=0.95, method="wilson") """ ro.r(r_string) A_C = np.array(ro.r['CI'])

print(' ') print('Estimate, CI') print(A_C) print(' ')

Control Parameters

n = 28 print('n = ',n)

nrep = 10 #10000 print('reps = ',nrep)

runs = 10 #1000 print('runs = ',runs)

P_10s = [0.00, 0.04, 0.08, 0.12]

d_CP = {} d_Re = {}

for P_10 in P_10s: pvals = [.3, P_10, (.3-P_10), .4] print('Prob ',pvals) print('total P = ',sum(pvals))

P_11 = pvals[0]
P_10 = pvals[1]
P_01 = pvals[2]
P_00 = pvals[3]

I_T = (P_11+P_01)/(P_11+P_10)-1
print('True I  = ',I_T)

print('Estimate the Coverage Probability using simulation')
CP = []

for it in range(nrep):

    # Generate the sample data
    li = np.random.multinomial(n, pvals)
    data = pd.DataFrame({'A':[1]*li[0] +[1]*li[1] +[0]*li[2] +[0]*li[3],
                         'B':[1]*li[0] +[0]*li[1] +[1]*li[2] +[0]*li[3]})

    # Results
    Lower = {}
    Media = {}
    Upper = {}

    # Results
    I_R = []

    for R in range(runs):

        # Bootstrap size = n
        BooP = data.sample(n, replace=True)

        # Data
        X_11 = len(BooP[((BooP.A==1)&(BooP.B==1))])
        X_10 = len(BooP[((BooP.A==1)&(BooP.B==0))])
        X_01 = len(BooP[((BooP.A==0)&(BooP.B==1))])
        X_00 = len(BooP[((BooP.A==0)&(BooP.B==0))])

        # Improvement (I) = pB/pA-1
        if X_11+X_10 == 0:
            I_x = 10101 # approx infinity!
        else:
            I_x = (X_11+X_01)/(X_11+X_10)-1

        # Results
        I_R.append(I_x)

        # CI
        Lower[R] = np.percentile(I_R,  2.5)
        Media[R] = np.percentile(I_R, 50  )
        Upper[R] = np.percentile(I_R, 97.5)

    Low = Lower[max(list(Lower.keys()))]
    Med = Media[max(list(Lower.keys()))]
    Hig = Upper[max(list(Lower.keys()))]

    # Check whether the interval contains the true value
    if (I_T < Hig) and (I_T > Low):
        CP.append(1)
    else:
        CP.append(0)

end = time.time()
print('time    = ',end - start)
print('CP      = ',sum(CP)/len(CP))

# results
d_Re[P_10] = CP
d_CP[P_10] = sum(CP)/len(CP)

CI

CI_Low = [] CI_High = [] for P_10 in d_CP.keys(): low, high = statsmodels.stats.proportion.proportion_confint(d_CP[float(P_10)]*nrep, nrep, alpha=1-0.95, method='wilson') CI_Low.append(low) CI_High.append(high)

print('Plot') df_G1 = pd.DataFrame({'P_10' : list(d_CP.keys()), 'CP' : list(d_CP.values()), 'Lo' : CI_Low, 'Hi' : CI_High})

fig, ax1 = plt.subplots(1,1)

df_G1.plot(x='P_10', y='Hi', legend=False, ax=ax1, label='95% CI', linewidth=5, color='k', linestyle='--') df_G1.plot(x='P_10', y='CP', legend=False, ax=ax1, label='CP', linewidth=5, color='k', linestyle='-') df_G1.plot(x='P_10', y='Lo', legend=False, ax=ax1, label='95% CI', linewidth=5, color='k', linestyle='--')

for item in ([ax1.title, ax1.xaxis.label, ax1.yaxis.label] + ax1.get_xticklabels() + ax1.get_yticklabels()): item.set_fontsize(22)

plt.xlabel('$P(10)$') plt.ylabel('$CP$')

plt.xlim([0,0.12]) plt.ylim([0.9,1])

ax1.set_yticks([.9,.95,1]) plt.xticks([0.00, 0.04, 0.08, 0.12])

plt.grid(which='both', color='b')

fig = plt.gcf() fig.set_size_inches(4,4) plt.show() plt.clf()

R. Cox
  • 179
  • follow up question: what is the coverage? https://stats.stackexchange.com/questions/509227/coverage-of-a-bootstrap-confidence-interval-for-a-change-in-a-binomial-proportio – R. Cox Feb 12 '21 at 14:45
  • I think the coverage is around 93%, slightly under the target 95% https://stats.stackexchange.com/questions/509227/coverage-of-a-bootstrap-confidence-interval-for-a-change-in-a-binomial-proportio/509548#509548 – R. Cox Feb 15 '21 at 12:23
0

I'm getting

I = 52% [39%, 55%]

when I make the assumption that the sample is just as representative of the population for method A as it is for method B. This is the other extreme from assuming independence.

multinomCI gives the Wilson CI for P11, P10, P01, P00.

I want to get P11, P10, P01, P00 for a given location in probability space (u)

0 < u < 1

I assume that the CI is about symmetric and that therefore:

enter image description here

I varied the step size (S) for how many equally distributed samples I took from u

u = [S to 1-S, step S]

This gave:

enter image description here

Figure 1, Improvement (I) against step size (S)

In python:

import numpy as np
import rpy2.robjects as ro
from rpy2.robjects.packages import importr
import matplotlib.pyplot as plt

package_name = "DescTools"

try: pkg = importr(package_name) except: ro.r(f'install.packages("{package_name}")') pkg = importr(package_name) pkg

print('Confidence Level (CL) = fun(u), assuming symmetry') def C_L(u): if u >= 0.5: return 2(u-0.5) else: return -2(u-0.5)

print('Vary u') steps = [1e-1,2e-2,1e-2,2e-3,1e-3,1e-4,1e-5]

CI(I) for various steps

CI_I = {}

for step in steps: Ues = np.linspace(step,1-step,int(1/step))

# Vary the CL
CLs = [C_L(element) for element in Ues]

CIes = {}
P11s = {}
P10s = {}
P01s = {}
P00s = {}
pA_s = {}
pB_s = {}
I_es = {}

for u in Ues:

    # CL
    CL = C_L(u)

    # CI (P11, P10, P01, P00)
    r_string = &quot;&quot;&quot;CI = MultinomCI(c(11,0,6,11), conf.level=CL, method=&quot;wilson&quot;)
    &quot;&quot;&quot;.replace('CL',str(CL))
    ro.r(r_string)
    CIes[CL] = np.array(ro.r['CI'])

    # P11
    if u &gt;= 0.5:
        P11s[u] = (CIes[CL][:,2][0])
    else:
        P11s[u] = (CIes[CL][:,1][0])

    # P10
    if u &gt;= 0.5:
        P10s[u] = (CIes[CL][:,2][1])
    else:
        P10s[u] = (CIes[CL][:,1][1])

    # P01
    if u &gt;= 0.5:
        P01s[u] = (CIes[CL][:,2][2])
    else:
        P01s[u] = (CIes[CL][:,1][2])

    # P00
    if u &gt;= 0.5:
        P00s[u] = (CIes[CL][:,2][3])
    else:
        P00s[u] = (CIes[CL][:,1][3])

    # pA = P_11 + P_10
    pA_s[u] = P11s[u] + P10s[u]

    # pB = P_11 + P_01
    pB_s[u] = P11s[u] + P01s[u]

    # Improvement (I) = pB/pA-1
    I_es[u] = pB_s[u]/pA_s[u]-1

# CI(I)
CI_I[step] = [np.percentile(list(I_es.values()),  2.5),
              np.percentile(list(I_es.values()), 50),
              np.percentile(list(I_es.values()), 97.5)]

print('CI_I = ',CI_I)

Dict to Lists

labels, CI_data = [zip(CI_I.items())]

Low = [item[0] for item in CI_data] Med = [item[1] for item in CI_data] H_i = [item[2] for item in CI_data]

Plot

fig, ax = plt.subplots(1, 1) ax.plot(steps, Low, linewidth=5, color='k', linestyle='--') ax.plot(steps, Med, linewidth=5, color='k', linestyle='-') ax.plot(steps, H_i, linewidth=5, color='k', linestyle='--')

for item in ([ax.title, ax.xaxis.label, ax.yaxis.label] + ax.get_xticklabels() + ax.get_yticklabels()): item.set_fontsize(22)

plt.xlabel('$S$') plt.ylabel('$I$') plt.grid(b=True, which='major', color='b') ax.set_xscale('log') plt.xlim([min(steps),max(steps)]) plt.ylim([0.3,0.6])

fig = plt.gcf() fig.set_size_inches(4,4) plt.show() plt.clf()

Coverage

I think the coverage is around 32%, vastly under the target 95%.

enter image description here

Figure 2, Coverage Probability (CP) against P(10)

I assessed it with a simulation, using the same approach as for the Bootstrap answer. With 10,000 samples for each of the same 4 P(10) values that I used to asses the bootstrap technique it took 32 hours to run.

In python:

import numpy as np
import pandas as pd
import time
import pickle
import statsmodels.api
import rpy2.robjects as ro
from rpy2.robjects.packages import importr
import matplotlib.pyplot as plt

package_name = "DescTools"

try: pkg = importr(package_name) except: ro.r(f'install.packages("{package_name}")') pkg = importr(package_name) pkg

start = time.time()

print('Control Parameters') n = 28 print('sample size (n) = ',n)

nrep = 10000 print('coverage sample size = ',nrep)

step = 1e-4 print('Probability space step size = ',step)

print('Equal points in probability space: 0 < u < 1') Ues = np.linspace(step,1-step,int(1/step))

print('Confidence Level (CL) = fun(u), assuming symmetry') def C_L(u): if u >= 0.5: return 2(u-0.5) else: return -2(u-0.5)

d_CP = {} d_Re = {} CI_I = {}

P_10s = [0.00, 0.04, 0.08, 0.12]

for P_10 in P_10s: pvals = [.3, P_10, (.3-P_10), .4] print('Prob ',pvals) print('total P = ',sum(pvals))

P11 = pvals[0]
P10 = pvals[1]
P01 = pvals[2]
P00 = pvals[3]

I_T = (P11+P01)/(P11+P10)-1
print('True I  = ',I_T)

print('Estimate the Coverage Probability using simulation')
CP = []

for it in range(nrep):

    # Generate sample data
    ExeS = np.random.multinomial(n, pvals)

    CIes = {}
    P11s = {}
    P10s = {}
    P01s = {}
    P00s = {}
    pA_s = {}
    pB_s = {}
    I_es = {}

    for u in Ues:

            # CL
            CL = C_L(u)

            # CI (P11, P10, P01, P00)
            r_string = &quot;&quot;&quot;CI = MultinomCI(c(P1,P2,P3,P4), conf.level=CL, method=&quot;wilson&quot;)
            &quot;&quot;&quot;.replace('CL',str(CL)).replace('P1',str(ExeS[0])).replace('P2',str(ExeS[1])).replace('P3',str(ExeS[2])).replace('P4',str(ExeS[3]))
            ro.r(r_string)

            CIes[CL] = np.array(ro.r['CI'])

            # P11
            if u &gt;= 0.5:
                P11s[u] = (CIes[CL][:,2][0])
            else:
                P11s[u] = (CIes[CL][:,1][0])

            # P10
            if u &gt;= 0.5:
                P10s[u] = (CIes[CL][:,2][1])
            else:
                P10s[u] = (CIes[CL][:,1][1])

            # P01
            if u &gt;= 0.5:
                P01s[u] = (CIes[CL][:,2][2])
            else:
                P01s[u] = (CIes[CL][:,1][2])

            # P00
            if u &gt;= 0.5:
                P00s[u] = (CIes[CL][:,2][3])
            else:
                P00s[u] = (CIes[CL][:,1][3])

            # pA = P_11 + P_10
            pA_s[u] = P11s[u] + P10s[u]

            # pB = P_11 + P_01
            pB_s[u] = P11s[u] + P01s[u]

            # Improvement (I) = pB/pA-1
            I_es[u] = pB_s[u]/pA_s[u]-1

    # CI(I)
    CI_I[it] = [np.percentile(list(I_es.values()),  2.5),
                np.percentile(list(I_es.values()), 50),
                np.percentile(list(I_es.values()), 97.5)]



    Low = CI_I[it][0]
    Med = CI_I[it][1]
    Hig = CI_I[it][2]

    # Check whether the interval contains the true value
    if (I_T &lt; Hig) and (I_T &gt; Low):
        CP.append(1)
    else:
        CP.append(0)

# results
d_Re[P_10] = CP
d_CP[P_10] = sum(CP)/len(CP)

end = time.time()
print('time    = ',end - start)

CI

CI_Low = [] CI_High = [] for P_10 in d_CP.keys(): low, high = statsmodels.stats.proportion.proportion_confint(d_CP[float(P_10)]*nrep, nrep, alpha=1-0.95, method='wilson') CI_Low.append(low) CI_High.append(high)

print('Plot') df_G1 = pd.DataFrame({'P_10' : list(d_CP.keys()), 'CP' : list(d_CP.values()), 'Lo' : CI_Low, 'Hi' : CI_High})

fig, ax1 = plt.subplots(1,1)

df_G1.plot(x='P_10', y='Hi', legend=False, ax=ax1, label='95% CI', linewidth=5, color='k', linestyle='--') df_G1.plot(x='P_10', y='CP', legend=False, ax=ax1, label='CP', linewidth=5, color='k', linestyle='-') df_G1.plot(x='P_10', y='Lo', legend=False, ax=ax1, label='95% CI', linewidth=5, color='k', linestyle='--')

for item in ([ax1.title, ax1.xaxis.label, ax1.yaxis.label] + ax1.get_xticklabels() + ax1.get_yticklabels()): item.set_fontsize(22)

plt.xlabel('$P(10)$') plt.ylabel('$CP$')

plt.xlim([0,0.12]) plt.ylim([0,1]) plt.xticks([0.00, 0.04, 0.08, 0.12])

plt.grid(which='both', color='b')

fig = plt.gcf() fig.set_size_inches(4,4) plt.show() plt.clf()

R. Cox
  • 179
0

Note that the question states that

These proportions all relate to the same sample (n=28). They are not independent of each other.

However, if we were to assume independence, then we could use the approach of

Newcombe, R.G., 1998. Interval estimation for the difference between independent proportions: comparison of eleven methods. Statistics in medicine, 17(8), pp.873-890.

Who presents

APPENDIX I: MULTIPLICATIVE SCALE SYMMETRY OF CONDITIONAL INTERVALS DERIVED FROM THE WILSON SCORE INTERVAL

In Newcombe, 1998's notation, if:

CI [f/(f+g)] = [L,U]

Then:

CI [f/g] = [L/(1-L),U/(1-U)]

In the case in this question, this gives:

I = 55% [-26%, 224%]

In python:

import statsmodels.api

x1 = 17 x2 = 11 CL = 0.95 n = x1+x2

P_E = x1/(n-x1) -1 print('Point Estimate = ',P_E)

low, high = statsmodels.stats.proportion.proportion_confint(x1, n, alpha=1-CL, method='wilson')

L_1 = low / (1-low) -1 H_1 = high / (1-high) -1 print('CI = ',L_1, H_1)

Coverage

I think the coverage is around 99.88%, vastly over the target 95%.

enter image description here

Figure 1, Coverage Probability (CP) against P(10)

I assessed it with a simulation, using the same approach as for the Bootstrap answer. With 100,000 samples for each of the same 4 P(10) values that I used to asses the bootstrap technique it took 41 seconds to run.

In python:

import numpy as np
import pandas as pd
import time
import statsmodels.api
import matplotlib.pyplot as plt

n = 28 print('sample size (n) = ',n)

nrep = 100000 print('coverage sample size = ',nrep)

CL = 0.95 print('Confidence Level = ',CL)

start = time.time()

d_CP = {} d_Re = {}

P_10s = [0.00, 0.04, 0.08, 0.12]

for P_10 in P_10s: pvals = [.3, P_10, (.3-P_10), .4] print('Prob ',pvals) print('total P = ',sum(pvals))

P11 = pvals[0]
P10 = pvals[1]
P01 = pvals[2]
P00 = pvals[3]

I_T = (P11+P01)/(P11+P10)-1
print('True I  = ',I_T)

print('Estimate the Coverage Probability using simulation')
CP = []

for it in range(nrep):

    # Generate sample data
    ExeS = np.random.multinomial(n, pvals)

    x11 = ExeS[0]
    x10 = ExeS[1]
    x01 = ExeS[2]
    x00 = ExeS[3]

    low, high = statsmodels.stats.proportion.proportion_confint((x11+x01), (x11*2+x10+x01), alpha=1-CL, method='wilson')
    L_1 = low  / (1-low)  -1
    H_1 = high / (1-high) -1

    # Check whether the interval contains the true value
    if (I_T &lt; H_1) and (I_T &gt; L_1):
        CP.append(1)
    else:
        CP.append(0)

# results
d_Re[P_10] = CP
d_CP[P_10] = sum(CP)/len(CP)
print('CP = ',d_CP[P_10])

end = time.time()
print('time = ',end - start)

CI

CI_Low = [] CI_High = [] for P_10 in d_CP.keys(): low, high = statsmodels.stats.proportion.proportion_confint(d_CP[float(P_10)]*nrep, nrep, alpha=1-0.95, method='wilson') CI_Low.append(low) CI_High.append(high)

print('Plot') df_G1 = pd.DataFrame({'P_10' : list(d_CP.keys()), 'CP' : list(d_CP.values()), 'Lo' : CI_Low, 'Hi' : CI_High})

fig, ax1 = plt.subplots(1,1)

df_G1.plot(x='P_10', y='Hi', legend=False, ax=ax1, label='95% CI', linewidth=5, color='k', linestyle='--') df_G1.plot(x='P_10', y='CP', legend=False, ax=ax1, label='CP', linewidth=5, color='k', linestyle='-') df_G1.plot(x='P_10', y='Lo', legend=False, ax=ax1, label='95% CI', linewidth=5, color='k', linestyle='--')

for item in ([ax1.title, ax1.xaxis.label, ax1.yaxis.label] + ax1.get_xticklabels() + ax1.get_yticklabels()): item.set_fontsize(22)

plt.xlabel('$P(10)$') plt.ylabel('$CP$')

plt.xlim([0,0.12]) plt.ylim([0.998,1]) plt.xticks([0.00, 0.04, 0.08, 0.12]) plt.yticks([0.998,0.999,1])

plt.grid(which='both', color='b')

fig = plt.gcf() fig.set_size_inches(4,4) plt.show() plt.clf()

R. Cox
  • 179