Computational instability when computing covariance in `glm` or `optim`

Question

When we estimate parameters with the maximum likelihood method, then we can estimate the standard error with the Hessian.

The code below tries to estimate this for the case of the question: How to compute the standard error of the variable $- \frac{\hat{\beta}_0}{\hat{\beta}_1}$ of a logistic regression? I did this, because I wanted to try to use optim to get a direct estimate of the variable $\hat\beta_0/\hat\beta_1$ by using a re-parameterized function.

Problem: For the same model, optim does not give the same output as with the use of glm.

Question: What is the numerical instability that causes optim and glm to have a discrepancy? Are they both wrong, or is one better? Aren't both just different methods to optimize the same cost function and aren't they supposed to give the same result?

Note: When I use different data then both estimates of the covariance are the same. So it doesn't seem to be a mistake with a factor, like using deviance instead of likelihood.

set.seed(1)
y = c(FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE)
x = c(175, 175, 175, 175, 189, 189, 189, 189, 210, 210, 210, 210, 221, 221, 221, 221, 235, 235, 235, 235, 247, 247, 247, 247, 275, 275, 275, 275, 278, 278, 278, 278, 288, 288, 288, 288, 317, 317, 317, 317, 329, 329, 329, 329, 348, 348, 348, 348, 362, 362, 362, 362, 375, 375, 375, 375, 387, 387, 387, 387, 403, 403, 403, 403, 412, 412, 412, 412, 431, 431, 431, 431, 445, 445, 445, 445, 451, 451, 451, 451, 462, 462, 462, 462, 472, 472, 472, 472, 485, 485, 485, 485, 497, 497, 497, 497, 503, 503, 503, 503, 512, 512, 512, 512, 525, 525, 525, 525, 535, 535, 535, 535, 547, 547, 547, 547, 556, 556, 556, 556, 578, 578, 578, 578, 593, 593, 593, 593, 601, 601, 601, 601, 614, 614, 614, 614, 628, 628, 628, 628, 650, 650, 650, 650)
if I use this data instead it works
x = rnorm(100)
y = rbinom(100, 1, plogis(-x))
modeling with glm
mod_glm = glm(y ~ x, data = data, family = binomial())
modeling with optim
logL = function(par) {
  a = par[1]
  b = par[2]
  p = 1/(1+exp(-a-b*x))
  -sum(dbinom(y,1,p, log = TRUE))
}
opt = optim(c(-16,0.02),logL, hessian = TRUE, control = list(abstol = 10^-10))
the same estimates
mod_glm$coefficients
(Intercept)            x
-16.35496868   0.02650619
opt$par
[1] -16.35551219   0.02650727
different covariance estimates
solve(opt$hessian)
[,1]          [,2]
[1,] 23.22243513 -4.012676e-02
[2,] -0.04012676  6.970628e-05
vcov(mod_glm)
(Intercept)             x
(Intercept)  16.1995378 -2.764790e-02
x            -0.0276479  4.754875e-05

Below is an image of the data. There are only a few cases of $y = 1$. Also the $x$ variable has a large scale.

score 2 · Answer 1 · answered Jun 16 '23 at 12:25

The function optim estimates the gradient using a finite difference approximation method. This can cause small deviations from the true gradient.

Using parameters parscale and ndeps we can change the stepsizes used in the approximation of the gradient. When we use

opt = optim(c(-16,0.02),logL, hessian = TRUE, 
            control = list(parscale = c(16,0.03), ndeps = c(10^-6,10^-6)))

then the estimate of the variance is similar to the case when we use glm

solve(opt$hessian)
#              [,1]          [,2]
#  [1,] 16.20523400 -2.765783e-02
#  [2,] -0.02765783  4.756605e-05
vcov(mod_glm)
#              (Intercept)             x
#  (Intercept)  16.1995378 -2.764790e-02
#  x            -0.0276479  4.754875e-05

Computational instability when computing covariance in `glm` or `optim`

if I use this data instead it works

x = rnorm(100)

y = rbinom(100, 1, plogis(-x))

modeling with glm

modeling with optim

the same estimates

(Intercept) x

-16.35496868 0.02650619

[1] -16.35551219 0.02650727

different covariance estimates

[,1] [,2]

[1,] 23.22243513 -4.012676e-02

[2,] -0.04012676 6.970628e-05

(Intercept) x

(Intercept) 16.1995378 -2.764790e-02

x -0.0276479 4.754875e-05

1 Answers1