12

I'm modelling some count data using negative binomial regression with glm.nb in R. I've noticed that my point estimates are quite consistently not at the midpoint of the 95% CIs and wondering if this is normal because of the distribution or if it's a sign I'm doing something horribly wrong.

Have never worked with negative binomial or any Poisson distribution before so would appreciate a helping hand here!

Nick Cox
  • 56,404
  • 8
  • 127
  • 185
  • 5
    Not a negbin, but a simple example where CIs are asymmetric (even if coverage is set to be symmetric, which is of course also not necessarily what we want) is a CI for a binomial parameter with "very few" or "very many" successes: https://stats.stackexchange.com/q/82720/1352 – Stephan Kolassa Aug 13 '23 at 12:51
  • 1
    As a general comment. many CIs are best calculated on say logarithmic or logit scale and back-transformed to the original scale. This comment applies to at least one estimator of the binomial parameter (see previous comment) and of the geometric mean. The final CIs are then typically and even naturally asymmetric. – Nick Cox Aug 14 '23 at 06:38
  • Consider bootrapping-based CIs and it becomes clear it is not the case – Firebug Aug 14 '23 at 07:19
  • There are numerous examples on this website already that deal with symmetry of confidence intervals. There's even a question that has it literally in the title asymmetric confidence intervals – Sextus Empiricus Aug 15 '23 at 09:41

4 Answers4

13

TL DR No, they don't have to be at the midpoint.

There are at least two ways to show this. We could run the example from R help, and then use functions to get things:

set.seed(1234)
library(MASS)

quine.nb1 <- glm.nb(Days ~ Sex/(Age + Eth*Lrn), data = quine) summary(quine.nb1) #Param est. for SexM = 3.01 confint(quine.nb1) #95% CI for Sex M = -1.26 to 0.29 (-1.26 + 0.29)/2 #-0.485

Or we could look at the equations for NB regression and realize they are not linear, and involve logs and gamma functions. Indeed, they are quite complex.

Peter Flom
  • 119,535
  • 36
  • 175
  • 383
7

In general not! In practice it is often the case (but only on a suitably chosen scale), because many confidence intervals are based on normal approximations of the form $\hat{\theta} \pm 1.96.. \times \text{SE}(\hat{\theta})$ (or on Student-t distributions that are also symmetrical around the estimate) on a suitable scale.

For example, confidence intervals for linear regression are on the usual linear scale, but for logistic regression this is only symmetrical around the estimate on the log-odds-# scale (but no longer on the odds/odds-ratio scale), for Poisson regression on the log scale (but not on the rate or rate-ratio scale). and so on.

Nick Cox
  • 56,404
  • 8
  • 127
  • 185
Björn
  • 32,022
5

It depends on the distribution

In general, when you form a shortest confidence interval from a pivotal quantity that has a non-symmetric distribution, the point estimator is not in the middle of the confidence interval. Roughly speaking, this occurs because the "mode" of the pivotal distribution is not in the middle of the highest density region. (I am making some assumptions here about how you are forming your point estimator and your confidence interval.) If you would like to learn a bit more about this topic, and look at how a shortest confidence interval is formed in the context of a univariate distribution, O'Neill (2022) provides a fairly straightforward discussion with diagrams.

Ben
  • 124,856
3

There are also settings where you might deliberately set up a confidence interval to be asymmetric even for a Normally distributed parameter. Peter Hoff and co-workers have written about some of these. The idea is to use independent external information to adjust how much of the coverage error goes at the upper vs lower end of the interval based on whether the external information says the true value is likely to be high or low.

Somewhat surprisingly, they show this can be done while retaining exactly the correct coverage -- this is possible because the property of being a 95% confidence interval procedure only specifies coverage at the true value, not at any false value.

Thomas Lumley
  • 38,062