1

I have a matched pairs design. I have a 2x2 table showing success/failure in pre and post:

Post (yes) Post (no)
Pre (yes) 0 10
Pre (no) 20 30

My colleague would like a confidence interval on the proportion of post (yes). How can I obtain one?

My start: The observations are conditionally independent given the "pre" status, right?

So can I just use the standard Wald-type CI: $\hat{p} \pm z_{\alpha/2}\sqrt{\hat{p}(1-\hat{p})/n}$, where $\hat{p} = (0+20)/(0+10+20+30)$, $z_{\alpha/2}=1.96$ for a 95% confidence interval and $n=0+10+20+30 = 60$?

1 Answers1

1

If I am understanding your question correctly, yes you can. You could consider incorporating a log or logit link function, i.e.

$$\text{exp}\Big( \text{log}\{\hat{p}\}\pm z_{\alpha/2}\Big[\sqrt{\hat{p}(1-\hat{p})/n}\Big]/\hat{p}\Big)$$

$$\text{or}$$

$$\text{logit}^{-1}\Big( \text{log}\Big\{\frac{\hat{p}}{1-\hat{p}}\Big\}\pm z_{\alpha/2}\Big[\sqrt{\hat{p}(1-\hat{p})/n}\Big]/[\hat{p}(1-\hat{p})]\Big).$$

The log link function is useful if your estimates are near 0 and your sample size is small. The logit link function is useful if your estimates are near 0 or near 1 and your sample size is small. Your estimate of $20/60=0.33$ is not close to zero by most standards, but the link functions may still help to improve the coverage of the confidence interval. For your estimate of $0.33$ they will shorten the lower confidence limit and lengthen the upper limit relative to a Wald interval using an identity link. Here log refers to the natural log with base $e$.

Wald with identity link (0.21, 0.45)
Wald with log link (0.23, 0.48)
Wald with logit link (0.23, 0.46)

With your sample size all of the intervals look similar.

If, for example, you had witnessed 3 events in a sample of size 30 then the confidence limits would be

Wald with identity link (-0.01, 0.21)
Wald with log link (0.03, 0.29)
Wald with logit link (0.03, 0.27)

Of course we would never report a negative proportion so the lower limit using the identity link would be truncated to 0 (not inclusive).

  • Hi Geoffrey, thanks for your thoughts. Do you have any references to support the idea regarding the coverage? I'm just wondering if there's a compelling argument for using this method with the log-link function instead of the standard method. – StatsSorceress Aug 11 '21 at 16:04
  • Any text book on generalized linear models would cover it. I like the books by Agresti. The idea is we are approximating the binomial sampling distribution with a normal distribution. When the true $p$ is near 0 and the sample size is small the binomial sampling distribution is better approximated by a log-normal. If you run a regression in Proc Genmod with dist=bin and link=log your exponentiated coefficients are risk ratios. – Geoffrey Johnson Aug 11 '21 at 16:07
  • Is there a method that would guarantee the bounds would be within the interval [0,1]? Because I know Wald intervals can result in bounds outside that interval, which doesn't really make sense for a CI on a proportion.... – StatsSorceress Aug 11 '21 at 16:07
  • 1
    The logit transformation provides this guarantee so long as your point estimate is not exactly 0 nor exactly 1. In that case I would invert the CDF of a binomial distribution to obtain a one-sided confidence bound. Inverting a binomial CDF works in any situation, but is not easily written in a closed form. – Geoffrey Johnson Aug 11 '21 at 16:09
  • 1
    The log link guarantees the lower limit will not go below 0. This is useful if you are estimating small proportions. – Geoffrey Johnson Aug 11 '21 at 16:15