When we estimate parameters with the maximum likelihood method, then we can estimate the standard error with the Hessian.
The code below tries to estimate this for the case of the question: How to compute the standard error of the variable $- \frac{\hat{\beta}_0}{\hat{\beta}_1}$ of a logistic regression? I did this, because I wanted to try to use optim to get a direct estimate of the variable $\hat\beta_0/\hat\beta_1$ by using a re-parameterized function.
Problem: For the same model, optim does not give the same output as with the use of glm.
Question: What is the numerical instability that causes optim and glm to have a discrepancy? Are they both wrong, or is one better? Aren't both just different methods to optimize the same cost function and aren't they supposed to give the same result?
Note: When I use different data then both estimates of the covariance are the same. So it doesn't seem to be a mistake with a factor, like using deviance instead of likelihood.
set.seed(1)
y = c(FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE)
x = c(175, 175, 175, 175, 189, 189, 189, 189, 210, 210, 210, 210, 221, 221, 221, 221, 235, 235, 235, 235, 247, 247, 247, 247, 275, 275, 275, 275, 278, 278, 278, 278, 288, 288, 288, 288, 317, 317, 317, 317, 329, 329, 329, 329, 348, 348, 348, 348, 362, 362, 362, 362, 375, 375, 375, 375, 387, 387, 387, 387, 403, 403, 403, 403, 412, 412, 412, 412, 431, 431, 431, 431, 445, 445, 445, 445, 451, 451, 451, 451, 462, 462, 462, 462, 472, 472, 472, 472, 485, 485, 485, 485, 497, 497, 497, 497, 503, 503, 503, 503, 512, 512, 512, 512, 525, 525, 525, 525, 535, 535, 535, 535, 547, 547, 547, 547, 556, 556, 556, 556, 578, 578, 578, 578, 593, 593, 593, 593, 601, 601, 601, 601, 614, 614, 614, 614, 628, 628, 628, 628, 650, 650, 650, 650)
if I use this data instead it works
x = rnorm(100)
y = rbinom(100, 1, plogis(-x))
modeling with glm
mod_glm = glm(y ~ x, data = data, family = binomial())
modeling with optim
logL = function(par) {
a = par[1]
b = par[2]
p = 1/(1+exp(-a-b*x))
-sum(dbinom(y,1,p, log = TRUE))
}
opt = optim(c(-16,0.02),logL, hessian = TRUE, control = list(abstol = 10^-10))
the same estimates
mod_glm$coefficients
(Intercept) x
-16.35496868 0.02650619
opt$par
[1] -16.35551219 0.02650727
different covariance estimates
solve(opt$hessian)
[,1] [,2]
[1,] 23.22243513 -4.012676e-02
[2,] -0.04012676 6.970628e-05
vcov(mod_glm)
(Intercept) x
(Intercept) 16.1995378 -2.764790e-02
x -0.0276479 4.754875e-05
Below is an image of the data. There are only a few cases of $y = 1$. Also the $x$ variable has a large scale.
