Why does glm.nb throw a "missing value" error only on very specific inputs

Question

glm.nb throws an unusual error on certain inputs. While there are a variety of values that cause this error, changing the input even very slightly can prevent the error.

A reproducible example:

set.seed(11)
pop <- rnbinom(n=1000,size=1,mu=0.05)
glm.nb(pop~1,maxit=1000)

Running this code throws the error:

Error in while ((it <- it + 1) < limit && abs(del) > eps) { : 
  missing value where TRUE/FALSE needed

At first I assumed that this had something to do with the algorithm not converging. However, I was surprised to find that changing the input even very slightly can prevent the error. For example:

pop[1000] <- pop[1000] + 1
glm.nb(pop~1,maxit=1000)

I've found that it throws this error on 19.4% of the seeds between 1 and 500:

fit.with.seed = function(s) {
    set.seed(s)
    pop <- rnbinom(n=1000, size=1, mu=0.05)
    m = glm.nb(pop~1, maxit=1000)
}

errors = sapply(1:500, function(s) {
    is.null(tryCatch(fit.with.seed(s), error=function(e) NULL))
})

mean(errors)

I've found only one mention of this error anywhere, on a thread with no responses.

What could be causing this error, and how can it be fixed (other than randomly permuting the inputs every time glm.nb throws an error?)

ETA: Setting control=glm.control(maxit=200,trace = 3) finds that the theta.ml algorithm breaks by getting very large, then becoming -Inf, then becoming NaN:

theta.ml: iter67 theta =5.77203e+15
theta.ml: iter68 theta =5.28327e+15
theta.ml: iter69 theta =1.41103e+16
theta.ml: iter70 theta =-Inf
theta.ml: iter71 theta =NaN

This won't solve your problem, but it may speed up the search for someone more knowledgeable: setting `control=glm.control(maxit=200,trace = 3)` sheds some important light on exactly what's going wrong in `theta.ml`. — joran, Jul 31 '12 at 22:56
Can you post your `sessionInfo()`? I can't replicate the error on 2.15.1 on Windows in either 32 or 64-bit. — nograpes, Jul 31 '12 at 23:10
for what it's worth you may be able to get this to work with `bbmle::mle2(Y~dnbinom(mu=exp(logmu),size=exp(logk),parameters(logmu=~ X1 + X2 + offset(X3)), start=..., ...)` — Ben Bolker, Jul 31 '12 at 23:35
Thanks, @BenBolker: I'm trying that now. I've edited the question to work with the reproducible example nograpes listed (since it's reproducible between platforms, and also more compact). — David Robinson, Aug 01 '12 at 01:06
I'm also having this trouble, and it really worries me that this isn't completely solved yet. I'm using MASS:glm.nb. I've run my analysis on a mac and windows now, with slightly differing results. It's very unsettling. — Laserhedvig, Apr 26 '20 at 12:28

Ben Bolker · Accepted Answer · 2014-12-10T20:35:46.917

It's a bit crude, but in the past I have been able to work around problems with glm.nb by resorting to straight maximum likelihood estimation (i.e. no clever iterative estimation algorithms as used in glm.nb)

Some poking around/profiling indicates that the MLE for the theta parameter is effectively infinite. I decided to fit it on the inverse scale, so that I could put a boundary at 0 (a fancier version would set up a log-likelihood function that would revert to Poisson at theta=zero, but that would undo the point of trying to come up with a quick, canned solution).

With two of the bad examples given above, this works reasonably well, although it does warn that the parameter fit is on the boundary ...

library(bbmle)
m1 <- mle2(Y~dnbinom(mu=exp(logmu),size=1/invk),
           data=d1,
           parameters=list(logmu~X1+X2+offset(X3)),
           start=list(logmu=0,invk=1),
           method="L-BFGS-B",
           lower=c(rep(-Inf,12),1e-8))

The second example is actually more interesting because it demonstrates numerically that the MLE for theta is essentially infinite even though we have a good-sized data set that is exactly generated from negative binomial deviates (or else I'm confused about something ...)

set.seed(11);pop <- rnbinom(n=1000,size=1,mu=0.05);glm.nb(pop~1,maxit=1000)
m2 <- mle2(pop~dnbinom(mu=exp(logmu),size=1/invk),
           data=data.frame(pop),
           start=list(logmu=0,invk=1),
           method="L-BFGS-B",
           lower=c(-Inf,1e-8))

+1. One thing that worries me is that it gives an `ABNORMAL_TERMINATION_IN_LNSRCH` message). I'm currently investigating how this method compares to `glm.nb` in the cases where `glm.nb` does work. I've also had a surprising amount of success by very slightly altering the inputs in the cases where `glm.nb` breaks (though it disturbs me that I don't know why that works). — David Robinson, Aug 01 '12 at 02:35
you could also try this with other bounded optimizers, e.g. `library(optimx); ... ; mle2(...,optimizer="optimx",method="bobyqa")` — Ben Bolker, Dec 27 '12 at 16:45

Why does glm.nb throw a "missing value" error only on very specific inputs

1 Answers1

Linked