4

I'm using R to calculate the two-sample test for equality of proportions, where the two proportions are 350/400 and 25/25. So:

> prop.test(c(350,25),c(400,25))                                                                                                                                                           

        2-sample test for equality of proportions with continuity correction

data:  c(350, 25) out of c(400, 25) 
X-squared = 2.4399, df = 1, p-value = 0.1183
alternative hypothesis: two.sided 
95 percent confidence interval:
 -0.17865986 -0.07134014 
sample estimates:
prop 1 prop 2 
 0.875  1.000 

Warning message:
In prop.test(c(350, 25), c(400, 25), correct = FALSE) :
  Chi-squared approximation may be incorrect

What I can't reconcile on my own is that the p-value is greater than 0.05, and yet the 95% confidence interval for the difference does not include 0. I thought there was an 'if and only if' relationship between the two (The p-value < alpha iff the (1-alpha) confidence interval of the difference does not include 0).

What am I not seeing? My only guess is there's something fundamental that I'm misunderstanding, or that it has something to do with that warning message about chi-squared approximation.

Glen_b
  • 282,281
ohspite
  • 143

2 Answers2

8

I presume they result from two somewhat different approximations in this instance.

For the ordinary chi-square test, the interval that corresponds to the chi-square is the Wilson score interval

$$\frac{1}{1 + \frac{1}{n} z_{1 - \frac{1}{2}\alpha}^2} \left[ \hat p + \frac{1}{2n} z_{1 - \frac{1}{2}\alpha}^2 \pm z_{1 - \frac{1}{2}\alpha} \sqrt{ \frac{1}{n}\hat p \left(1 - \hat p\right) + \frac{1}{4n^2}z_{1 - \frac{1}{2}\alpha}^2 } \right]$$

Looking into the code (just type prop.test to see the code for it), it looks like you get the Wilson score interval by default, but with a continuity correction applied to $p$.

[Note that one of the references in the help (?prop.test) discusses eleven different confidence intervals for the difference in proportions; at most one will always exactly correspond to any given form of the hypothesis test.]

While the without-continuity-correction Wilson score interval will correspond to the without-continuity-correction chi-square, my guess is that the continuity-corrected version of both that is being used no longer correspond exactly.

I guess the way to get an interval that should correspond would be to write the interval corresponding to the continuity-corrected chi-squared in similar fashion to the way the Wilson score interval is derived (see the above Wikipedia link) and solve that for the endpoints.

Glen_b
  • 282,281
  • You know, it's funny--if I posted on Stack Overflow I probably would have looked at the code first, but over here I didn't even think about it. Thanks for your help! – ohspite Apr 25 '13 at 14:28
2

Unfortunately, the accepted answer is not correct for the 2 sample prop.test. The by-hand calculation shows, that the confidence interval is the Wald's one (if no correction is used), and not Wilson.

This is also referred here: https://stats.stackexchange.com/a/570528/382831

The returned CI is the Wald's one:

> (ppt <- prop.test(x = c(11, 8), n = c(16, 21),correct = FALSE))
2-sample test for equality of proportions without continuity correction

data: c(11, 8) out of c(16, 21) X-squared = 3.4159, df = 1, p-value = 0.06457 alternative hypothesis: two.sided 95 percent confidence interval: -0.001220547 0.614315785 sample estimates: prop 1 prop 2 0.6875000 0.3809524

which agrees with the logistic regression followed by the marginal effect:

data <- data.frame(Status = c(rep(TRUE, 11), rep(FALSE, 16-11), rep(TRUE, 8), rep(FALSE, 21-8)), Group = c(rep("Gr1", 16), rep("Gr2", 21)))

> m <- glm(Status ~ Group,family = binomial(), data=data) > margins::margins_summary(m)

factor AME SE z p lower upper GroupGr2 -0.3065 0.1570 -1.9522 0.0509 -0.6143 0.0012

which agrees with

> PropCIs::wald2ci(11, 16, 8, 21, conf.level=0.95, adjust="Wald")

data:

95 percent confidence interval: -0.001220547 0.614315785 sample estimates: [1] 0.3065476

While the reported p-value comes from the Rao score test:

> anova(m, test="Rao")
Analysis of Deviance Table

Model: binomial, link: logit

Response: Status

Terms added sequentially (first to last)

  Df Deviance Resid. Df Resid. Dev    Rao Pr(&gt;Chi)  

NULL 36 51.266
Group 1 3.4809 35 47.785 3.4159 0.06457 .


Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

kmonk_gun
  • 81
  • 4
  • Thanks; I've only just seen that you posted a response. I no longer recall what convinced me that it was using a Wilson interval instead of a Wald interval. I would be inclined to simply remove it but that would leave your answer correcting nothing and some aspects of the question no longer discussed. Naturally I'll need to do something about my answer. I'll try to remember to ping you here when I make some changes. – Glen_b Mar 14 '23 at 21:10