Interval Estimation for a Change in a Binomial Proportion

Question

I'm not sure how to estimate the confidence interval (CI) for a change in a small sample size binomial proportion using the same sample set both times.

I have two methods that I would like to compare (A and B). I have tested both methods on the same sample ($n=28$) from a large population.

Method A gave the correct result 11 times but method B gave the correct result 17 times. I think that this indicates that method B is 17/11-1 = 55% better than method A. As well as this point estimate for the difference between the methods, I would like to understand the uncertainty caused by my small number of samples. How can I construct a 95% CI for the 55% improvement in performance please?

In 11 cases, both test A and test B worked.
In no cases, test A did work but test B didn't work.
In 6 cases, test A didn't work but test B did work.
In 11 cases, both test A and test B didn't work.

These proportions all relate to the same sample ($n=28$). They are not independent of each other.

Is there a way to calculate CIs that doesn't assume independence please? I would be happy with confidence intervals or with credible intervals and would also be interested in arguments as to why such measures of uncertainty were not appropriate.

If you really used the same patients both times, then summary results you give are not helpful. // You'd need both results for each patient. // For 6 patients 3 & 5 correct could mean 000111 vs 011111 (2 improvements, so maybe B better) or 000111 vs 111110 (3 improvements and one worse, only 2 agree, contradictory in a way such that maybe both A & B useless). A good start would be to provide relevant data. — BruceET, Jan 30 '21 at 20:58
Very good question @BruceET, I have added those details. Thanks for your help! — R. Cox, Jan 31 '21 at 10:53
This is different to the case where the two proportions can be assumed to be independent, here: https://stats.stackexchange.com/questions/58667/change-in-binomial-proportion-confidence-interval?rq=1 — R. Cox, Feb 01 '21 at 10:30

BruceET · Answer 1 · 2021-01-31T19:28:03.467

Comment: Thanks for additional info. Maybe start with $2\times 2$ table:

              A
          ---------
B         OK    Not       Tot
-----------------------------
OK        11      6        17      
Not        0     11        11
-----------------------------
T0t       11     17        28

Then a chi-squared test on this table has a very small P-value, supporting the anticipated lack of independence in the paired design. [A simulated P-value is necessary because of the small counts.]

chisq.test(rbind(c(11,6),c(0,11)), sim=T, B=5000)
Pearson's Chi-squared test 
with simulated p-value 
(based on 5000 replicates)


data:  rbind(c(11, 6), c(0, 11))
X-squared = 11.723, df = NA, p-value = 0.0009998

In such cases, where the chi-squared test requires simulation, the simulated P-value is often about the same as for Fisher's exact test.

fisher.test(rbind(c(11,6),c(0,11)))
     Fisher's Exact Test for Count Data


data:  rbind(c(11, 6), c(0, 11))
p-value = 0.0009334
alternative hypothesis: 
  true odds ratio is not equal to 1
95 percent confidence interval:
 2.983321      Inf
sample estimates:
odds ratio 
       Inf

It seems clear that B does better than A, but with only 28 patients and six differences in responses, I do not want to speculate how to find a useful confidence interval for a difference in binomial proportions $p_B - p_A.$ Maybe someone else can answer that.

Why wouldn't you want to estimate a CI? The Wilson interval has good coverage for understanding the uncertainty of small n binomial proportions. https://stats.stackexchange.com/questions/469701/interval-estimation-for-a-binomial-proportion-given-a-specific-test-outcome — R. Cox, Feb 01 '21 at 13:49

R. Cox · Answer 2 · 2021-02-11T17:57:14.073

I can get a 95% CI for pA and pB:

pA: 0.398 [0.236, 0.697]
pB: 0.607 [0.338, 0.971]

I used MultinomCI which gives confidence intervals for multinomial proportions https://www.rdocumentation.org/packages/DescTools/versions/0.99.39/topics/MultinomCI

I chose the Wilson interval because I think that it gives good coverage for binomial proportions so I guessed that it would be ok for this as well.

In Python:

import numpy as np
import rpy2.robjects as ro
from rpy2.robjects.packages import importr
package_name = "DescTools"
try:
    pkg = importr(package_name)
except:
    ro.r(f'install.packages("{package_name}")')
    pkg = importr(package_name)
pkg
r_string = """CI = MultinomCI(c(11,0,6,11), conf.level=0.95, method="wilson")
"""
ro.r(r_string)
R_C = ro.r['CI']
A_C = np.array(R_C)
print(' ')
print('Estimate, CI')
print(A_C)
print(' ')
Nomeclature: the estimate of A=1, B=1 with a 95% CI is E_11(L_11,H_11)
E_11 = A_C[0][0]
E_10 = A_C[1][0]
E_01 = A_C[2][0]
E_00 = A_C[3][0]
L_11 = A_C[0][1]
L_10 = A_C[1][1]
L_01 = A_C[2][1]
L_00 = A_C[3][1]
H_11 = A_C[0][2]
H_10 = A_C[1][2]
H_01 = A_C[2][2]
H_00 = A_C[3][2]
print('sum prob = 1: ', (E_11+E_10+E_01+E_00))
print(' ')
p_A = E_11 + E_10
p_B = E_11 + E_01
CI_A = [(L_11 + L_10), (H_11 + H_10)]
CI_B = [(L_11 + L_01), (H_11 + H_01)]
print('pA: ', p_A)
print('CI_A: ', CI_A)
print(' ')
print('pB: ', p_B)
print('CI_B: ', CI_B)
print(' ')
print('Improvement (I) = pB/pA-1: ', p_B/p_A-1)

Now to try and get a 95% CI for:

Improvement (I) = pB/pA-1

I think that I first need to fit skew normal distributions to pA and pB:

import numpy as np
from scipy.stats import skewnorm
from scipy.optimize import differential_evolution
import matplotlib.pyplot as plt
pA = 0.398
pB = 0.607
CI_A = [0.236,0.697]
CI_B = [0.338,0.971]
print('fit a distribution...')
def fun(z):
    Skew, Scale, loc = z
lst = skewnorm.rvs(a=Skew,
             loc=loc,
             scale=Scale,
             size=1000)

lst=lst.tolist()

#mean = sum(lst)/len(lst)
median = np.percentile(lst, 50)
#mode = max(set(lst), key=lst.count)

CI = [np.percentile(lst, 2.5),
      np.percentile(lst, 97.5)]

# Error
Me_E = median - pB
Lo_E = CI[0] - CI_B[0]
Hi_E = CI[1] - CI_B[1]
error = Me_E**2 + Lo_E**2 + Hi_E**2
return error


Bounds of Skew, Scale & loc
bounds = ((-1,1),(0.1,1),(0,1))
maxiter = 5000
Optimiser
DE = differential_evolution(fun,bounds,maxiter=maxiter)
print('success = ',DE.success)
print(DE.message)
Skew  = DE.x[0]
Scale = DE.x[1]
loc   = DE.x[2]
print('skewness parameter = ',Skew)
print('scale = ',Scale)
print('loc = ',loc)
print('Sampling from the skew normal distribution')
lst = skewnorm.rvs(a=Skew,
                 loc=loc,
                 scale=Scale,
                 size=1000)
lst=lst.tolist()
mode = max(set(lst), key=lst.count)
print('mode = ',mode)
mean = sum(lst)/len(lst)
print('mean = ',mean)
median = np.percentile(lst, 50)
print('median = ',median)
CI = [np.percentile(lst, 2.5),
      np.percentile(lst, 97.5)]
print('CI = ',CI)
fig, ax = plt.subplots(1, 1)
ax.hist(lst)
plt.show()

That gives distribution parameters that give almost the same pA & pB:

pA: 0.423 [0.205, 0.642]
pB: 0.607 [0.336, 0.928]

Figure 1, pA and pB

Next, I combine the two distributions to get the CI for Improvement (I) = pB/pA-1

from scipy.stats import skewnorm
import matplotlib.pyplot as plt
import numpy as np
fig, ax = plt.subplots(1, 1)
A_ske =  0.9139952525939643
A_sca =  0.13849932254197161
A_loc =  0.3500567328083086
B_ske =  0.9960244237545532
B_sca =  0.18411099887378107
B_loc =  0.5126020868512271
Step = 0.001
ExeS = np.linspace(0+Step,1-Step,1/Step-1)
print('Inverse CDF transform')
A_val = skewnorm.ppf(q=ExeS, a=A_ske, loc=A_loc, scale=A_sca)
B_val = skewnorm.ppf(q=ExeS, a=B_ske, loc=B_loc, scale=B_sca)
Improvement
Improv = np.divide(B_val,A_val)
Improv = Improv -1
CI = [np.percentile(Improv, 2.5),
      np.percentile(Improv, 97.5)]
print('CI = ',CI)
A_PDF = {}
B_PDF = {}
for x in ExeS:
    A_PDF[x] = skewnorm.pdf(x=x, a=A_ske, loc=A_loc, scale=A_sca)
    B_PDF[x] = skewnorm.pdf(x=x, a=B_ske, loc=B_loc, scale=B_sca)
Dict to Lists
labels, A_data = [zip(A_PDF.items())]
labels, B_data = [zip(B_PDF.items())]
Plot
ax.plot(ExeS, A_data, label='A',  linewidth=5, color='k', linestyle='-')
ax.plot(ExeS, B_data, label='B',  linewidth=5, color='k', linestyle='--')
for item in ([ax.title, ax.xaxis.label, ax.yaxis.label] +
             ax.get_xticklabels() + ax.get_yticklabels()):
    item.set_fontsize(22)
legend = ax.legend(loc=0, ncol=1, bbox_to_anchor=(0.6, -.3, 1, 1),
           fancybox=True, shadow=False,
           framealpha=1, fontsize=22)
plt.setp(legend.get_title(),fontsize=22)
plt.xlabel('$x$')
plt.ylabel('$P$')
plt.grid(b=True, which='major', color='b')
plt.xlim([0,1])
plt.ylim([0,4])
fig = plt.gcf()
fig.set_size_inches(4,4)
plt.show()
plt.clf()

This method assumes that the sample is just as representative of the population for method A as it is for method B. This is the other extreme from assuming independence.

This gave a different improvement CI each time I ran it:

[40%,62%]
[41%,60%]
[41%,49%]
[43%,68%]
[39%,48%]

I think the inconsistency might be caused where I fit a distribution to pA and pB? — R. Cox, Feb 07 '21 at 17:36
I've tried to code a less round the houses approach to the non-bootstrap technique https://stackoverflow.com/questions/66155480/inverse-cdf-transform-of-the-wilson-interval — R. Cox, Feb 11 '21 at 13:34

R. Cox · Accepted Answer · 2021-02-16T12:34:22.210

Bootstrap

I tried a bootstrap. It often threw the error:

division by zero

I got around that by assuming:

infinity = approximately 10101

with the resulting estimates:

Figure 1, Improvement (I) against Bootstrap Size (BS) for runs = 100

Figure 2, Improvement (I) against against Runs for BS=28

This gave an improvement after 1000 runs with BS=28 of:

55% [14%, 150%]

In Python:

print('Generate the sample data')
data = pd.DataFrame({'A':[1]*11+[0]*6+[0]*11,
                     'B':[1]*11+[1]*6+[0]*11})
print('sample size: ',len(data))
print('')
print('A B X')
print('1 1',len(data[((data.A==1)&(data.B==1))]))
print('1 0',len(data[((data.A==1)&(data.B==0))]))
print('0 1',len(data[((data.A==0)&(data.B==1))]))
print('0 0',len(data[((data.A==0)&(data.B==0))]))
print('')
Results
I     = {}
Lower = {}
Media = {}
Upper = {}
Control Parameters
Runs = range(100)
#bootstrap_size = range(len(data))
BS_Max = 100
bootstrap_size = range(BS_Max)
for BS in bootstrap_size:
    #print('bootstrap size ', BS)
# Results
I_T = {}

for R in Runs:

    # Bootstrap
    BooP = data.sample(BS, replace=True)

    # Data
    X_11 = len(BooP[((BooP.A==1)&amp;(BooP.B==1))])
    X_10 = len(BooP[((BooP.A==1)&amp;(BooP.B==0))])
    X_01 = len(BooP[((BooP.A==0)&amp;(BooP.B==1))])
    X_00 = len(BooP[((BooP.A==0)&amp;(BooP.B==0))])

    # Improvement (I) = pB/pA-1
    if X_11+X_10 == 0:
        I_x = 10101 # approx infinity!
    else:
        I_x = (X_11+X_01)/(X_11+X_10)-1

    # Results
    I_T[R] = I_x

# Results
I[BS] = I_T

# CI
Lower[BS] = np.percentile(list(I[BS].values()),  2.5)
Media[BS] = np.percentile(list(I[BS].values()), 50  )
Upper[BS] = np.percentile(list(I[BS].values()), 97.5)

Coverage

I think the coverage is around 93%, slightly under the target 95%.

Figure 3, Coverage Probability (CP) against P(10)

I assessed it with a simulation.

Coverage changes with probability, so it would be good to try it with various values of the probabilities of the four possible outcomes (P11, P10, P01, P00). Unfortunately, running many values of each P would take too long. Instead, I used MultinomCI to get the Wilson interval for each P. This gave:

      Est   Low     High
P(11) 0.393 0.236   0.576
P(10) 0.000 0.000   0.121
P(01) 0.214 0.102   0.395
P(00) 0.393 0.236   0.576

I gave the name Prob to the vector [P11, P10, P01, P00]

sum (Prob) = 1

I took the range of likely values for P10 and assumed that as P10 increases, P01 decreases equally.

I assessed 4 values of Prob:

Prob = [0.3, 0.0,  0.3,  0.4]
Prob = [0.3, 0.04, 0.26, 0.4]
Prob = [0.3, 0.08, 0.22, 0.4]
Prob = [0.3, 0.12, 0.18, 0.4]

I ran it with:

n       =  28
runs    =  1000
reps    =  10000

which took 41.5 hours.

Last, I estimated the 95% CI for the coverage of the 95% CI, again using the Wilson interval.

In Python:

import numpy  as np
import pandas as pd
import time
import rpy2.robjects as ro
import statsmodels.api
import matplotlib.pyplot as plt
start = time.time()
from rpy2.robjects.packages import importr
package_name = "DescTools"
try:
    pkg = importr(package_name)
except:
    ro.r(f'install.packages("{package_name}")')
    pkg = importr(package_name)
pkg
r_string = """CI = MultinomCI(c(11,0,6,11), conf.level=0.95, method="wilson")
"""
ro.r(r_string)
A_C = np.array(ro.r['CI'])
print(' ')
print('Estimate, CI')
print(A_C)
print(' ')
Control Parameters
n = 28
print('n       = ',n)
nrep = 10 #10000
print('reps    = ',nrep)
runs = 10 #1000
print('runs    = ',runs)
P_10s = [0.00, 0.04, 0.08, 0.12]
d_CP = {}
d_Re = {}
for P_10 in P_10s:
    pvals = [.3, P_10, (.3-P_10), .4]
    print('Prob ',pvals)
    print('total P = ',sum(pvals))
P_11 = pvals[0]
P_10 = pvals[1]
P_01 = pvals[2]
P_00 = pvals[3]

I_T = (P_11+P_01)/(P_11+P_10)-1
print('True I  = ',I_T)

print('Estimate the Coverage Probability using simulation')
CP = []

for it in range(nrep):

    # Generate the sample data
    li = np.random.multinomial(n, pvals)
    data = pd.DataFrame({'A':[1]*li[0] +[1]*li[1] +[0]*li[2] +[0]*li[3],
                         'B':[1]*li[0] +[0]*li[1] +[1]*li[2] +[0]*li[3]})

    # Results
    Lower = {}
    Media = {}
    Upper = {}

    # Results
    I_R = []

    for R in range(runs):

        # Bootstrap size = n
        BooP = data.sample(n, replace=True)

        # Data
        X_11 = len(BooP[((BooP.A==1)&amp;(BooP.B==1))])
        X_10 = len(BooP[((BooP.A==1)&amp;(BooP.B==0))])
        X_01 = len(BooP[((BooP.A==0)&amp;(BooP.B==1))])
        X_00 = len(BooP[((BooP.A==0)&amp;(BooP.B==0))])

        # Improvement (I) = pB/pA-1
        if X_11+X_10 == 0:
            I_x = 10101 # approx infinity!
        else:
            I_x = (X_11+X_01)/(X_11+X_10)-1

        # Results
        I_R.append(I_x)

        # CI
        Lower[R] = np.percentile(I_R,  2.5)
        Media[R] = np.percentile(I_R, 50  )
        Upper[R] = np.percentile(I_R, 97.5)

    Low = Lower[max(list(Lower.keys()))]
    Med = Media[max(list(Lower.keys()))]
    Hig = Upper[max(list(Lower.keys()))]

    # Check whether the interval contains the true value
    if (I_T &lt; Hig) and (I_T &gt; Low):
        CP.append(1)
    else:
        CP.append(0)

end = time.time()
print('time    = ',end - start)
print('CP      = ',sum(CP)/len(CP))

# results
d_Re[P_10] = CP
d_CP[P_10] = sum(CP)/len(CP)


CI
CI_Low  = []
CI_High = []
for P_10 in d_CP.keys():
    low, high = statsmodels.stats.proportion.proportion_confint(d_CP[float(P_10)]*nrep,
                                                                nrep,
                                                                alpha=1-0.95,
                                                                method='wilson')
    CI_Low.append(low)
    CI_High.append(high)
print('Plot')
df_G1 = pd.DataFrame({'P_10' : list(d_CP.keys()),
                      'CP'   : list(d_CP.values()),
                      'Lo'   : CI_Low,
                      'Hi'   : CI_High})
fig, ax1 = plt.subplots(1,1)
df_G1.plot(x='P_10', y='Hi', legend=False, ax=ax1, label='95% CI',  linewidth=5, color='k', linestyle='--')
df_G1.plot(x='P_10', y='CP', legend=False, ax=ax1, label='CP',      linewidth=5, color='k', linestyle='-')
df_G1.plot(x='P_10', y='Lo', legend=False, ax=ax1, label='95% CI',  linewidth=5, color='k', linestyle='--')
for item in ([ax1.title, ax1.xaxis.label, ax1.yaxis.label] +
             ax1.get_xticklabels() + ax1.get_yticklabels()):
    item.set_fontsize(22)
plt.xlabel('$P(10)$')
plt.ylabel('$CP$')
plt.xlim([0,0.12])
plt.ylim([0.9,1])
ax1.set_yticks([.9,.95,1])
plt.xticks([0.00, 0.04, 0.08, 0.12])
plt.grid(which='both', color='b')
fig = plt.gcf()
fig.set_size_inches(4,4)
plt.show()
plt.clf()

follow up question: what is the coverage? https://stats.stackexchange.com/questions/509227/coverage-of-a-bootstrap-confidence-interval-for-a-change-in-a-binomial-proportio — R. Cox, Feb 12 '21 at 14:45
I think the coverage is around 93%, slightly under the target 95% https://stats.stackexchange.com/questions/509227/coverage-of-a-bootstrap-confidence-interval-for-a-change-in-a-binomial-proportio/509548#509548 — R. Cox, Feb 15 '21 at 12:23

R. Cox · Answer 4 · 2021-02-18T10:34:38.103

I'm getting

I = 52% [39%, 55%]

when I make the assumption that the sample is just as representative of the population for method A as it is for method B. This is the other extreme from assuming independence.

multinomCI gives the Wilson CI for P11, P10, P01, P00.

I want to get P11, P10, P01, P00 for a given location in probability space (u)

0 < u < 1

I assume that the CI is about symmetric and that therefore:

I varied the step size (S) for how many equally distributed samples I took from u

u = [S to 1-S, step S]

This gave:

Figure 1, Improvement (I) against step size (S)

In python:

import numpy as np
import rpy2.robjects as ro
from rpy2.robjects.packages import importr
import matplotlib.pyplot as plt
package_name = "DescTools"
try:
    pkg = importr(package_name)
except:
    ro.r(f'install.packages("{package_name}")')
    pkg = importr(package_name)
pkg
print('Confidence Level (CL) = fun(u), assuming symmetry')
def C_L(u):
    if u >= 0.5:
        return 2(u-0.5)
    else:
        return -2(u-0.5)
print('Vary u')
steps = [1e-1,2e-2,1e-2,2e-3,1e-3,1e-4,1e-5]
CI(I) for various steps
CI_I = {}
for step in steps:
    Ues = np.linspace(step,1-step,int(1/step))
# Vary the CL
CLs = [C_L(element) for element in Ues]

CIes = {}
P11s = {}
P10s = {}
P01s = {}
P00s = {}
pA_s = {}
pB_s = {}
I_es = {}

for u in Ues:

    # CL
    CL = C_L(u)

    # CI (P11, P10, P01, P00)
    r_string = &quot;&quot;&quot;CI = MultinomCI(c(11,0,6,11), conf.level=CL, method=&quot;wilson&quot;)
    &quot;&quot;&quot;.replace('CL',str(CL))
    ro.r(r_string)
    CIes[CL] = np.array(ro.r['CI'])

    # P11
    if u &gt;= 0.5:
        P11s[u] = (CIes[CL][:,2][0])
    else:
        P11s[u] = (CIes[CL][:,1][0])

    # P10
    if u &gt;= 0.5:
        P10s[u] = (CIes[CL][:,2][1])
    else:
        P10s[u] = (CIes[CL][:,1][1])

    # P01
    if u &gt;= 0.5:
        P01s[u] = (CIes[CL][:,2][2])
    else:
        P01s[u] = (CIes[CL][:,1][2])

    # P00
    if u &gt;= 0.5:
        P00s[u] = (CIes[CL][:,2][3])
    else:
        P00s[u] = (CIes[CL][:,1][3])

    # pA = P_11 + P_10
    pA_s[u] = P11s[u] + P10s[u]

    # pB = P_11 + P_01
    pB_s[u] = P11s[u] + P01s[u]

    # Improvement (I) = pB/pA-1
    I_es[u] = pB_s[u]/pA_s[u]-1

# CI(I)
CI_I[step] = [np.percentile(list(I_es.values()),  2.5),
              np.percentile(list(I_es.values()), 50),
              np.percentile(list(I_es.values()), 97.5)]


print('CI_I = ',CI_I)
Dict to Lists
labels, CI_data = [zip(CI_I.items())]
Low = [item[0] for item in CI_data]
Med = [item[1] for item in CI_data]
H_i = [item[2] for item in CI_data]
Plot
fig, ax = plt.subplots(1, 1)
ax.plot(steps, Low, linewidth=5, color='k', linestyle='--')
ax.plot(steps, Med, linewidth=5, color='k', linestyle='-')
ax.plot(steps, H_i, linewidth=5, color='k', linestyle='--')
for item in ([ax.title, ax.xaxis.label, ax.yaxis.label] +
             ax.get_xticklabels() + ax.get_yticklabels()):
    item.set_fontsize(22)
plt.xlabel('$S$')
plt.ylabel('$I$')
plt.grid(b=True, which='major', color='b')
ax.set_xscale('log')
plt.xlim([min(steps),max(steps)])
plt.ylim([0.3,0.6])
fig = plt.gcf()
fig.set_size_inches(4,4)
plt.show()
plt.clf()

Coverage

I think the coverage is around 32%, vastly under the target 95%.

Figure 2, Coverage Probability (CP) against P(10)

I assessed it with a simulation, using the same approach as for the Bootstrap answer. With 10,000 samples for each of the same 4 P(10) values that I used to asses the bootstrap technique it took 32 hours to run.

In python:

import numpy as np
import pandas as pd
import time
import pickle
import statsmodels.api
import rpy2.robjects as ro
from rpy2.robjects.packages import importr
import matplotlib.pyplot as plt
package_name = "DescTools"
try:
    pkg = importr(package_name)
except:
    ro.r(f'install.packages("{package_name}")')
    pkg = importr(package_name)
pkg
start = time.time()
print('Control Parameters')
n = 28
print('sample size (n) = ',n)
nrep = 10000
print('coverage sample size = ',nrep)
step = 1e-4
print('Probability space step size = ',step)
print('Equal points in probability space: 0 < u < 1')
Ues = np.linspace(step,1-step,int(1/step))
print('Confidence Level (CL) = fun(u), assuming symmetry')
def C_L(u):
    if u >= 0.5:
        return  2(u-0.5)
    else:
        return -2(u-0.5)
d_CP = {}
d_Re = {}
CI_I = {}
P_10s = [0.00, 0.04, 0.08, 0.12]
for P_10 in P_10s:
    pvals = [.3, P_10, (.3-P_10), .4]
    print('Prob ',pvals)
    print('total P = ',sum(pvals))
P11 = pvals[0]
P10 = pvals[1]
P01 = pvals[2]
P00 = pvals[3]

I_T = (P11+P01)/(P11+P10)-1
print('True I  = ',I_T)

print('Estimate the Coverage Probability using simulation')
CP = []

for it in range(nrep):

    # Generate sample data
    ExeS = np.random.multinomial(n, pvals)

    CIes = {}
    P11s = {}
    P10s = {}
    P01s = {}
    P00s = {}
    pA_s = {}
    pB_s = {}
    I_es = {}

    for u in Ues:

            # CL
            CL = C_L(u)

            # CI (P11, P10, P01, P00)
            r_string = &quot;&quot;&quot;CI = MultinomCI(c(P1,P2,P3,P4), conf.level=CL, method=&quot;wilson&quot;)
            &quot;&quot;&quot;.replace('CL',str(CL)).replace('P1',str(ExeS[0])).replace('P2',str(ExeS[1])).replace('P3',str(ExeS[2])).replace('P4',str(ExeS[3]))
            ro.r(r_string)

            CIes[CL] = np.array(ro.r['CI'])

            # P11
            if u &gt;= 0.5:
                P11s[u] = (CIes[CL][:,2][0])
            else:
                P11s[u] = (CIes[CL][:,1][0])

            # P10
            if u &gt;= 0.5:
                P10s[u] = (CIes[CL][:,2][1])
            else:
                P10s[u] = (CIes[CL][:,1][1])

            # P01
            if u &gt;= 0.5:
                P01s[u] = (CIes[CL][:,2][2])
            else:
                P01s[u] = (CIes[CL][:,1][2])

            # P00
            if u &gt;= 0.5:
                P00s[u] = (CIes[CL][:,2][3])
            else:
                P00s[u] = (CIes[CL][:,1][3])

            # pA = P_11 + P_10
            pA_s[u] = P11s[u] + P10s[u]

            # pB = P_11 + P_01
            pB_s[u] = P11s[u] + P01s[u]

            # Improvement (I) = pB/pA-1
            I_es[u] = pB_s[u]/pA_s[u]-1

    # CI(I)
    CI_I[it] = [np.percentile(list(I_es.values()),  2.5),
                np.percentile(list(I_es.values()), 50),
                np.percentile(list(I_es.values()), 97.5)]



    Low = CI_I[it][0]
    Med = CI_I[it][1]
    Hig = CI_I[it][2]

    # Check whether the interval contains the true value
    if (I_T &lt; Hig) and (I_T &gt; Low):
        CP.append(1)
    else:
        CP.append(0)

# results
d_Re[P_10] = CP
d_CP[P_10] = sum(CP)/len(CP)

end = time.time()
print('time    = ',end - start)


CI
CI_Low  = []
CI_High = []
for P_10 in d_CP.keys():
    low, high = statsmodels.stats.proportion.proportion_confint(d_CP[float(P_10)]*nrep,
                                                                nrep,
                                                                alpha=1-0.95,
                                                                method='wilson')
    CI_Low.append(low)
    CI_High.append(high)
print('Plot')
df_G1 = pd.DataFrame({'P_10' : list(d_CP.keys()),
                      'CP'   : list(d_CP.values()),
                      'Lo'   : CI_Low,
                      'Hi'   : CI_High})
fig, ax1 = plt.subplots(1,1)
df_G1.plot(x='P_10', y='Hi', legend=False, ax=ax1, label='95% CI',  linewidth=5, color='k', linestyle='--')
df_G1.plot(x='P_10', y='CP', legend=False, ax=ax1, label='CP',      linewidth=5, color='k', linestyle='-')
df_G1.plot(x='P_10', y='Lo', legend=False, ax=ax1, label='95% CI',  linewidth=5, color='k', linestyle='--')
for item in ([ax1.title, ax1.xaxis.label, ax1.yaxis.label] +
             ax1.get_xticklabels() + ax1.get_yticklabels()):
    item.set_fontsize(22)
plt.xlabel('$P(10)$')
plt.ylabel('$CP$')
plt.xlim([0,0.12])
plt.ylim([0,1])
plt.xticks([0.00, 0.04, 0.08, 0.12])
plt.grid(which='both', color='b')
fig = plt.gcf()
fig.set_size_inches(4,4)
plt.show()
plt.clf()

R. Cox · Answer 5 · 2021-02-18T19:52:07.120

Note that the question states that

These proportions all relate to the same sample (n=28). They are not independent of each other.

However, if we were to assume independence, then we could use the approach of

Newcombe, R.G., 1998. Interval estimation for the difference between independent proportions: comparison of eleven methods. Statistics in medicine, 17(8), pp.873-890.

Who presents

APPENDIX I: MULTIPLICATIVE SCALE SYMMETRY OF CONDITIONAL INTERVALS DERIVED FROM THE WILSON SCORE INTERVAL

In Newcombe, 1998's notation, if:

CI [f/(f+g)] = [L,U]

Then:

CI [f/g] = [L/(1-L),U/(1-U)]

In the case in this question, this gives:

I = 55% [-26%, 224%]

In python:

import statsmodels.api
x1 = 17
x2 = 11
CL = 0.95
n  = x1+x2
P_E = x1/(n-x1)       -1
print('Point Estimate = ',P_E)
low, high = statsmodels.stats.proportion.proportion_confint(x1, n, alpha=1-CL, method='wilson')
L_1 = low  / (1-low)  -1
H_1 = high / (1-high) -1
print('CI = ',L_1, H_1)

Coverage

I think the coverage is around 99.88%, vastly over the target 95%.

Figure 1, Coverage Probability (CP) against P(10)

I assessed it with a simulation, using the same approach as for the Bootstrap answer. With 100,000 samples for each of the same 4 P(10) values that I used to asses the bootstrap technique it took 41 seconds to run.

In python:

import numpy as np
import pandas as pd
import time
import statsmodels.api
import matplotlib.pyplot as plt
n = 28
print('sample size (n) = ',n)
nrep = 100000
print('coverage sample size = ',nrep)
CL = 0.95
print('Confidence Level = ',CL)
start = time.time()
d_CP = {}
d_Re = {}
P_10s = [0.00, 0.04, 0.08, 0.12]
for P_10 in P_10s:
    pvals = [.3, P_10, (.3-P_10), .4]
    print('Prob ',pvals)
    print('total P = ',sum(pvals))
P11 = pvals[0]
P10 = pvals[1]
P01 = pvals[2]
P00 = pvals[3]

I_T = (P11+P01)/(P11+P10)-1
print('True I  = ',I_T)

print('Estimate the Coverage Probability using simulation')
CP = []

for it in range(nrep):

    # Generate sample data
    ExeS = np.random.multinomial(n, pvals)

    x11 = ExeS[0]
    x10 = ExeS[1]
    x01 = ExeS[2]
    x00 = ExeS[3]

    low, high = statsmodels.stats.proportion.proportion_confint((x11+x01), (x11*2+x10+x01), alpha=1-CL, method='wilson')
    L_1 = low  / (1-low)  -1
    H_1 = high / (1-high) -1

    # Check whether the interval contains the true value
    if (I_T &lt; H_1) and (I_T &gt; L_1):
        CP.append(1)
    else:
        CP.append(0)

# results
d_Re[P_10] = CP
d_CP[P_10] = sum(CP)/len(CP)
print('CP = ',d_CP[P_10])

end = time.time()
print('time = ',end - start)


CI
CI_Low  = []
CI_High = []
for P_10 in d_CP.keys():
    low, high = statsmodels.stats.proportion.proportion_confint(d_CP[float(P_10)]*nrep,
                                                                nrep,
                                                                alpha=1-0.95,
                                                                method='wilson')
    CI_Low.append(low)
    CI_High.append(high)
print('Plot')
df_G1 = pd.DataFrame({'P_10' : list(d_CP.keys()),
                      'CP'   : list(d_CP.values()),
                      'Lo'   : CI_Low,
                      'Hi'   : CI_High})
fig, ax1 = plt.subplots(1,1)
df_G1.plot(x='P_10', y='Hi', legend=False, ax=ax1, label='95% CI',  linewidth=5, color='k', linestyle='--')
df_G1.plot(x='P_10', y='CP', legend=False, ax=ax1, label='CP',      linewidth=5, color='k', linestyle='-')
df_G1.plot(x='P_10', y='Lo', legend=False, ax=ax1, label='95% CI',  linewidth=5, color='k', linestyle='--')
for item in ([ax1.title, ax1.xaxis.label, ax1.yaxis.label] +
             ax1.get_xticklabels() + ax1.get_yticklabels()):
    item.set_fontsize(22)
plt.xlabel('$P(10)$')
plt.ylabel('$CP$')
plt.xlim([0,0.12])
plt.ylim([0.998,1])
plt.xticks([0.00, 0.04, 0.08, 0.12])
plt.yticks([0.998,0.999,1])
plt.grid(which='both', color='b')
fig = plt.gcf()
fig.set_size_inches(4,4)
plt.show()
plt.clf()

Interval Estimation for a Change in a Binomial Proportion

5 Answers5

Nomeclature: the estimate of A=1, B=1 with a 95% CI is E_11(L_11,H_11)

Bounds of Skew, Scale & loc

Optimiser

Improvement

Dict to Lists

Plot

Results

Control Parameters

Control Parameters

CI

CI(I) for various steps

Dict to Lists

Plot

CI

CI

Linked