R: Parameterization differences betwen MASS::glm.nb and glmmTMB "nbinom2"

Question

I'd like opinions on two differing GLM outputs in RStudio.

I model count data (dung pellets) over 21 sites, using quadrats counted as an area offset. I started with a GLM Poisson regression for the definition of the model structure, which was strongly overdispersed & zero inflated. I moved to Negative Binomial, which fixed both issues, but I have differing results according to the package used (MASS:glm.nb and glmmTMB nbinom2). They give the same theta estimation, same AIC, but different SE & z-values, in particular for one predictor, which appears as significant in one but not the other package. This is prior to model selection.

Thus, I'd like to know what are the differences in parameterization between the two packages used? From what I've read, they seem to use the same variance formulas, both estimated through Maximum Likelihood:

glmmTMB : variance = µ(1 + µ/k)
MASS: variance = μ+μ²/θ so that θ=1/k

I am suspecting that it is related to MASS::glm.nb dropping constants from the log-likelihood calculations while glmmTMB is not, but I cannot confirm it.

I'm inclined to stick with MASS as I don't use any RE here, but I'd like to know what is happening.

code output:

> # MASS package
> mod.nbmass <- glm.nb(pellets ~ RS1+RS2+RS4+RS5+RS6+offset(log(quadrats)), link = "log", data=data_site)
> summary(mod.nbmass)
Call:
glm.nb(formula = pellets ~ RS1 + RS2 + RS4 + RS5 + RS6 + offset(log(quadrats)), 
    data = data_site, link = "log", init.theta = 0.9808088892)
Deviance Residuals: 
    Min       1Q   Median       3Q      Max

-2.6923  -0.9313  -0.1119   0.2173   1.5372
Coefficients:
            Estimate Std. Error z value Pr(>|z|)

(Intercept)  0.17109    0.22170   0.772   0.4403

RS1         -0.04392    0.34372  -0.128   0.8983

RS2          0.21905    0.31118   0.704   0.4815

RS4         -0.04524    0.31740  -0.143   0.8866

RS5          0.21135    0.34775   0.608   0.5433

RS6         -0.67892    0.33963  -1.999   0.0456 *

Signif. codes:  0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for Negative Binomial(0.9808) family taken to be 1)
Null deviance: 29.252  on 20  degrees of freedom

Residual deviance: 24.854  on 15  degrees of freedom
AIC: 246.43
Number of Fisher Scoring iterations: 1
          Theta:  0.981 
      Std. Err.:  0.290 


2 x log-likelihood:  -232.429 
> # glmmTMB package
> mod.nbTMB<-glmmTMB(pellets ~ RS1+RS2+RS4+RS5+RS6+offset(log(quadrats)), family="nbinom2", data=data_site)
> summary(mod.nbTMB)
 Family: nbinom2  ( log )
Formula:          pellets ~ RS1 + RS2 + RS4 + RS5 + RS6 + offset(log(quadrats))
Data: data_site
 AIC      BIC   logLik deviance df.resid 

246.4    253.7   -116.2    232.4       14
Dispersion parameter for nbinom2 family (): 0.981
Conditional model:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)  0.17109    0.22163   0.772    0.440
RS1         -0.04393    0.39138  -0.112    0.911
RS2          0.21905    0.26685   0.821    0.412
RS4         -0.04525    0.30701  -0.147    0.883
RS5          0.21135    0.40676   0.520    0.603
RS6         -0.67892    0.48995  -1.386    0.166

Thank you!

Ben Bolker · Answer 1 · 2024-01-06T00:16:50.693

The fact that the coefficients, AIC, log-likelihood, dispersion parameters etc. are the same (up to rounding), and only the SE/z values differ, implies that this is not a difference in parameterization. There are two and possibly three differences between glm.nb and glmmTMB in the way that the covariance matrices of the fixed effects (and hence the SEs/Z-statistics/etc.) are computed.

tl;dr the issues are fairly subtle, but I expect that where there is a difference the glmmTMB estimates of the SE will be either more accurate or conservative (the first point below would suggest that glmmTMB SEs are too small; the second and third would say that where there is a difference, they would be more accurate than glm.nb's ...)

ML vs REML estimates of covariance

The maximum likelihood estimate of a variance is something like $\sum (y-\bar y)^2/N$, whereas an unbiased estimate (the Bessel correction) divides by $n-1$ rather than $n$. In the case of a (G)LM with $p$ parameters, the denominator is typically taken as $n-p$. glmmTMB effectively uses n rather than n-p, so the SEs will differ by a factor of $\sqrt{n/(n-p)}$. This has come up recently, see here for more discussion/another example.

accounting for uncertainty in the dispersion parameter

glm.nb computes the covariance matrix conditional on $\theta = \hat \theta$, whereas glmmTMB computes the covariance matrix that includes the uncertainty in the parameter estimates derived from the uncertainty in $\theta$ (the resulting unconditional variances can actually be smaller than the corresponding conditional variances, if there is a negative correlation between the estimate of the dispersion parameter and the estimated effects of the covariates).

observed information vs. expected (Fisher) information

I'm not sure whether this applies in this case, but glm() and friends (including glm.nb) compute the covariance based on a slightly different metric, the expected (Fisher) information, rather than the observed information (or "Hessian", the second derivative matrix at the MLE: see here). I'm not sure whether the log link is the canonical link for the negative binomial distribution or not (probably not?); if it is, then this source of difference wouldn't apply here.

Here is some exploration of the issues — doesn't quite match the example in the question, but as it wasn't reproducible ...

L <- load(url("http://www.math.mcmaster.ca/bolker/eeid/data/gopherdat2.RData"))
library(MASS)
library(glmmTMB)
m1 <- glm.nb(shells ~ factor(year) + offset(log(Area)),
             data = Gdat)
(sd1 <- sqrt(diag(vcov(m1))))
##      (Intercept) factor(year)2005 factor(year)2006 
##       0.3004183        0.4680113        0.4268809
m2 <- glmmTMB(shells ~ factor(year) + offset(log(Area)),
             family = nbinom2, data = Gdat)
coefficients are equal ...
all.equal(coef(m1), fixef(m2)$cond, tolerance = 1e-7)
(sd2 <- sqrt(diag(vcov(m2)$cond)))
(Intercept) factor(year)2005 factor(year)2006
0.2942739        0.4614845        0.4309163
accounting for denominator doesn't entirely resolve the issue
n <- nobs(m2)
npar <- length(fixef(m2)$cond)
(sd2*sqrt(n/(n-npar)))
(Intercept) factor(year)2005 factor(year)2006
0.3101919        0.4864474        0.4542256
derive conditional covariance matrix (i.e., ignoring effects of theta)
combine with denominator effect (still doesn't entirely resolve the difference)
ss <- TMB::sdreport(m2$obj)
newsd <- sqrt(diag(solve(solve(ss$cov.fixed)[1:3,1:3])))
(newsd*sqrt(n/(n-npar)))
beta      beta      beta
0.3072275 0.4853313 0.4482258

Using bbmle gives nearly identical results to glmmTMB (not surprising as the methods used — ML estimate of variance, unconditional covariances, observed information/Hessian — are the same ...)

library(bbmle)
m3 <- mle2(shells ~ dnbinom(mu = exp(logmu)*Area, size = exp(logsize)),
           parameters = list(logmu ~ factor(year)),
           start = list(logmu = 1, logsize = 1),
           data = Gdat)
all.equal(unname(coef(m3)[1:3]), unname(fixef(m2)$cond), tolerance = 1e-5)
sqrt(diag(vcov(m3)))
##      logmu.(Intercept) logmu.factor(year)2005 logmu.factor(year)2006 
##              0.2942785              0.4614903              0.4309221 
##                logsize 
##             0.6909350

R: Parameterization differences betwen MASS::glm.nb and glmmTMB "nbinom2"

1 Answers1

ML vs REML estimates of covariance

accounting for uncertainty in the dispersion parameter

observed information vs. expected (Fisher) information

coefficients are equal ...

(Intercept) factor(year)2005 factor(year)2006

0.2942739 0.4614845 0.4309163

accounting for denominator doesn't entirely resolve the issue

(Intercept) factor(year)2005 factor(year)2006

0.3101919 0.4864474 0.4542256

derive conditional covariance matrix (i.e., ignoring effects of theta)

combine with denominator effect (still doesn't entirely resolve the difference)

beta beta beta

0.3072275 0.4853313 0.4482258