How to calculate the Std. Error of estimates and 95% Confidence interval of the odds ratio of the logit model?

Question

For a simple data set:

anthers.sum <- structure(list(storage = c(1, 2), n = c(309, 247), 
    y = c(164, 155)), row.names = c(NA, -2L), class = "data.frame")
logit_model <- glm(cbind(y, n-y) ~ storage, data=anthers.sum, 
    family=binomial(link='logit'))
summary(logit_model)

Call:
glm(formula = cbind(y, n - y) ~ storage, 
    family = binomial(link = "logit"), data = anthers.sum)
Coefficients:
            Estimate Std. Error z value Pr(>|z|)

(Intercept)  -0.2754     0.2632  -1.046   0.2955

storage       0.3985     0.1741   2.289   0.0221 *

Signif. codes:  0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance:  5.2790e+00  on 1  degrees of freedom

Residual deviance: -3.7748e-14  on 0  degrees of freedom
AIC: 16.079
Number of Fisher Scoring iterations: 2

How to manually calculate the Std. Error of the estimates of this logit model?

I tried Newton method:

the log-likelihood function is $$l=\sum_{i=1}^{N}\left[y_i(\beta_1+\beta_2x_i)-n_i\log\left[1+\exp(\beta_1+\beta_2x_i)\right]+\log{n_i \choose y_i}\right]$$ the score matrix is $$U=\left[\begin{matrix} \frac{\partial l}{\partial\beta_1} \\ \frac{\partial l}{\partial\beta_2} \end{matrix}\right]=\left[\begin{matrix} \sum(y_i-n_i\pi_i) \\ \sum x_i(y_i-n_i\pi_i) \end{matrix}\right]$$

$$\pi_i=\frac{\exp(\beta_1+\beta_2x_i)}{1+\exp(\beta_1+\beta_2x_i)}$$

the information matrix is

$$J=\left[\begin{matrix} \sum n_i\pi_i(1-\pi_i) & \sum n_ix_i\pi_i(1-\pi_i) \\ \sum n_ix_i\pi_i(1-\pi_i) & \sum n_ix_i^2\pi_i(1-\pi_i) \end{matrix}\right]$$

For a log-likelihood function of a single parameter $\beta$ , the first three terms of the Taylor series approximation near an estimate $b$ are

$$\begin{align}l(\beta)&\approx l(b)+(\beta-b)U(b)+\frac{1}{2}(\beta-b)^2U'(b)\\ &=l(b)+(\beta-b)U(b)-\frac{1}{2}(\beta-b)^2J(b)\\ &=-\frac{1}{2}(\beta-b)^2J(b) \end{align}$$ when $U(b)$ is the maximum likelihood estimate, $U(b)=0$ Then $$l(\beta)-l(b)=-\frac{1}{2}(\beta-b)^2J(b)$$

$$\mathbb E[(b-\beta)(b-\beta)^T]=J^{-1}\mathbb E(UU^T)J^{-1}=J^{-1}$$ because $$J=\mathbb E(UU^T)$$ then $$2[l(\beta)-l(b)]=(b-\beta)^2J(b)\sim \chi^2(p)$$

For the one-parameter case, $$b\sim N(\beta, J^{-1})$$

The iteration is :

$$J^{(m-1)} b^m=J^{(m-1)} b^{(m-1)}+U^{(m-1)}$$

X <- matrix(c(1,1,1,2), ncol=2,
            byrow = TRUE)
B=matrix(c(0, 0), ncol = 1, byrow = FALSE)
for(i in seq(1:6)){
  L=X %*% B
  pii = exp(L)/(1+exp(L))
  npi1_pi = n*pii*(1-pii)
  U=matrix(c(sum(y-n*pii), 
             sum((y-n*pii)*x)),
           ncol = 1, byrow = FALSE)
  J=matrix(c(sum(npi1_pi),
             sum(npi1_pi*x),
             sum(npi1_pi*x),
             sum(npi1_pi*x*x)),
           ncol=2,
           byrow = TRUE)
  J_1 = solve(J)
  B=J_1 %*% (crossprod(J, B)+U)
  print(paste0("iter: ", i))
  print("coefficients:")
  print(B)
  print("variances:")
  print(diag(J_1))
  print("std.error:")
  print(sqrt(diag(J_1)))
  writeLines("\n")
}

[1] "iter: 1"
[1] "coefficients:"
           [,1]
[1,] -0.2641668
[2,]  0.3871441
[1] "variances:"
[1] 0.06797427 0.02913932
[1] "std.error:"
[1] 0.2607188 0.1707024
[1] "iter: 2"
[1] "coefficients:"
           [,1]
[1,] -0.2753545
[2,]  0.3984872
[1] "variances:"
[1] 0.06924687 0.03026490
[1] "std.error:"
[1] 0.2631480 0.1739681
[1] "iter: 3"
[1] "coefficients:"
           [,1]
[1,] -0.2753712
[2,]  0.3985039
[1] "variances:"
[1] 0.06929756 0.03031522
[1] "std.error:"
[1] 0.2632443 0.1741127
[1] "iter: 4"
[1] "coefficients:"
           [,1]
[1,] -0.2753712
[2,]  0.3985039
[1] "variances:"
[1] 0.06929763 0.03031529
[1] "std.error:"
[1] 0.2632444 0.1741129
[1] "iter: 5"
[1] "coefficients:"
           [,1]
[1,] -0.2753712
[2,]  0.3985039
[1] "variances:"
[1] 0.06929763 0.03031529
[1] "std.error:"
[1] 0.2632444 0.1741129
[1] "iter: 6"
[1] "coefficients:"
           [,1]
[1,] -0.2753712
[2,]  0.3985039
[1] "variances:"
[1] 0.06929763 0.03031529
[1] "std.error:"
[1] 0.2632444 0.1741129

The odds ratios:

x <- anthers.sum$storage
#Logit model odds ratios Constant
exp(coefficients(summary(logit_model))["(Intercept)",1]+coefficients(summary(logit_model))["storage",1]*x[1])
#Logit model odds ratios Treatment vs. control
exp(coefficients(summary(logit_model))["storage",1]*(x[2]-x[1]))

[1] 1.131034
[1] 1.489594

questionr::odds.ratio(logit_model, level=0.95)

Waiting for profiling to be done...
                 OR   2.5 % 97.5 %       p  
(Intercept) 0.75929 0.45311 1.2725 0.29553  
storage     1.48959 1.06015 2.0989 0.02209 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

When I calculate the 95% confidence interval of the odds ratios:

#95% CI of constant
exp(coefficients(summary(logit_model))["(Intercept)",1]+qnorm(c(0.025,0.975))*coefficients(summary(logit_model))["(Intercept)",2])
#95% CI of treatment vs constant
exp(coefficients(summary(logit_model))["storage",1]+qnorm(c(0.025,0.975))*coefficients(summary(logit_model))["storage",2])

[1] 0.4532458 1.2719847
[1] 1.058919 2.095430

They can't match the results of questionr::odds.ratio

questionr::odds.ratio gives you profile likelihood based confidence intervals, see the tag [tag:profile-likelihood]. Also, please try to write your R formulas in a humanly readable form! — kjetil b halvorsen, Dec 20 '23 at 17:26

score 3 · Accepted Answer · edited Dec 20 '23 at 18:34

I don't follow your attempts so far, but reproducing the standard errors is not that complicated. They are the square roots on the diagonal of the coefficient's covariance matrix $(\textbf{X}^\textbf{T}\textbf{VX})^{-1}$. $\textbf{X}$ is the design matrix and $\textbf{V}$ a square matrix with on its diagonal $\hat\pi_i(1-\hat\pi_i)$, the variance of the predicted response for observation $i$. Let's run through the calculation:

## Convert your data to long format, one observation per row
long <- do.call(rbind, lapply(seq_len(nrow(anthers.sum)), \(i) {
  data.frame(
    storage = anthers.sum[i, "storage"],
    resp = c(rep(1, anthers.sum[i, "y"]),
             rep(0, anthers.sum[i, "n"] - anthers.sum[i, "y"]))
  )
}))
This is the exact same fit you had before
fit <- glm(resp ~ storage, data=long, family=binomial)
Design matrix
X <- model.matrix(~storage, data=long)
Predicted response for each observation
pred <- predict(fit, long, type = "response")
V <- diag(nrow(X))
diag(V) <- pred*(1-pred)
(se <- sqrt(diag(solve(t(X) %% V %% X))))

> (Intercept)     storage 
>  0.2632444   0.1741129

These numbers match the summary exactly. You can easily turn this standard error into a Wald confidence interval for any given $\alpha$:

alpha <- 0.05
exp(t(mapply(function(pe, se) {
          c(pe, pe + c(-1, 1) * se * qnorm(1-alpha/2))
        }, pe=coef(fit), se=se)))
> (Intercept) 0.7592902 0.4532458 1.271985
> storage     1.4895944 1.0589193 2.095430

There are two things to keep in mind:

An actual model implementation will almost never use this particular calculation. GLMs are usually fit iteratively, and during iteration certain implementations keep track of (partial second) derivatives to determine convergence. This so-called Hessian matrix is the inverse of the variance-covariance. I believe R's glm uses a QR decomposition though. You can retrieve that matrix as vcov(fit), the square roots of its diagonals are the standard errors of the summary object (they are the same to high numerical precision but not calculated as above).
The confidence intervals from questionr are profiled from the likelihood, not based on standard error estimates. This is also what happens by default if you call the stats::confint method on a GLM object for which questionr::odds.ratio is just a wrapper, the function doing the actual work is MASS:::profile.glm. See e.g. here for some more discussion on the difference between these approaches - profile likelihood is generally considered more reliable but will require numerical optimization.

How to calculate the Std. Error of estimates and 95% Confidence interval of the odds ratio of the logit model?

1 Answers1

This is the exact same fit you had before

Design matrix

Predicted response for each observation