P value and confidence interval for two sample test of proportions disagree

Question

I'm using R to calculate the two-sample test for equality of proportions, where the two proportions are 350/400 and 25/25. So:

> prop.test(c(350,25),c(400,25))                                                                                                                                                           

        2-sample test for equality of proportions with continuity correction

data:  c(350, 25) out of c(400, 25) 
X-squared = 2.4399, df = 1, p-value = 0.1183
alternative hypothesis: two.sided 
95 percent confidence interval:
 -0.17865986 -0.07134014 
sample estimates:
prop 1 prop 2 
 0.875  1.000 

Warning message:
In prop.test(c(350, 25), c(400, 25), correct = FALSE) :
  Chi-squared approximation may be incorrect

What I can't reconcile on my own is that the p-value is greater than 0.05, and yet the 95% confidence interval for the difference does not include 0. I thought there was an 'if and only if' relationship between the two (The p-value < alpha iff the (1-alpha) confidence interval of the difference does not include 0).

What am I not seeing? My only guess is there's something fundamental that I'm misunderstanding, or that it has something to do with that warning message about chi-squared approximation.

Glen_b · Accepted Answer · 2013-04-25T13:47:28.127

I presume they result from two somewhat different approximations in this instance.

For the ordinary chi-square test, the interval that corresponds to the chi-square is the Wilson score interval

$$\frac{1}{1 + \frac{1}{n} z_{1 - \frac{1}{2}\alpha}^2} \left[ \hat p + \frac{1}{2n} z_{1 - \frac{1}{2}\alpha}^2 \pm z_{1 - \frac{1}{2}\alpha} \sqrt{ \frac{1}{n}\hat p \left(1 - \hat p\right) + \frac{1}{4n^2}z_{1 - \frac{1}{2}\alpha}^2 } \right]$$

Looking into the code (just type prop.test to see the code for it), it looks like you get the Wilson score interval by default, but with a continuity correction applied to $p$.

[Note that one of the references in the help (?prop.test) discusses eleven different confidence intervals for the difference in proportions; at most one will always exactly correspond to any given form of the hypothesis test.]

While the without-continuity-correction Wilson score interval will correspond to the without-continuity-correction chi-square, my guess is that the continuity-corrected version of both that is being used no longer correspond exactly.

I guess the way to get an interval that should correspond would be to write the interval corresponding to the continuity-corrected chi-squared in similar fashion to the way the Wilson score interval is derived (see the above Wikipedia link) and solve that for the endpoints.

You know, it's funny--if I posted on Stack Overflow I probably would have looked at the code first, but over here I didn't even think about it. Thanks for your help! — ohspite, Apr 25 '13 at 14:28

score 2 · Answer 2 · answered Mar 10 '23 at 22:20

Unfortunately, the accepted answer is not correct for the 2 sample prop.test. The by-hand calculation shows, that the confidence interval is the Wald's one (if no correction is used), and not Wilson.

This is also referred here: https://stats.stackexchange.com/a/570528/382831

The returned CI is the Wald's one:

> (ppt <- prop.test(x = c(11, 8), n = c(16, 21),correct = FALSE))
2-sample test for equality of proportions without continuity correction


data:  c(11, 8) out of c(16, 21)
X-squared = 3.4159, df = 1, p-value = 0.06457
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.001220547  0.614315785
sample estimates:
   prop 1    prop 2 
0.6875000 0.3809524

which agrees with the logistic regression followed by the marginal effect:

data <- data.frame(Status = c(rep(TRUE, 11), rep(FALSE, 16-11), rep(TRUE, 8), rep(FALSE, 21-8)), Group = c(rep("Gr1", 16), rep("Gr2", 21)))
> m <- glm(Status ~ Group,family = binomial(), data=data)
> margins::margins_summary(m)
factor     AME     SE       z      p   lower  upper
 GroupGr2 -0.3065 0.1570 -1.9522 0.0509 -0.6143 0.0012

which agrees with

> PropCIs::wald2ci(11, 16, 8, 21, conf.level=0.95, adjust="Wald")
data:
95 percent confidence interval:
 -0.001220547  0.614315785
sample estimates:
[1] 0.3065476

While the reported p-value comes from the Rao score test:

> anova(m, test="Rao")
Analysis of Deviance Table
Model: binomial, link: logit
Response: Status
Terms added sequentially (first to last)
  Df Deviance Resid. Df Resid. Dev    Rao Pr(&gt;Chi)  

NULL                     36     51.266

Group  1   3.4809        35     47.785 3.4159  0.06457 .

Signif. codes:  0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Thanks; I've only just seen that you posted a response. I no longer recall what convinced me that it was using a Wilson interval instead of a Wald interval. I would be inclined to simply remove it but that would leave your answer correcting nothing and some aspects of the question no longer discussed. Naturally I'll need to do something about my answer. I'll try to remember to ping you here when I make some changes. — Glen_b, Mar 14 '23 at 21:10

P value and confidence interval for two sample test of proportions disagree

2 Answers2

Linked

Related