For a simple data set:
anthers.sum <- structure(list(storage = c(1, 2), n = c(309, 247),
y = c(164, 155)), row.names = c(NA, -2L), class = "data.frame")
logit_model <- glm(cbind(y, n-y) ~ storage, data=anthers.sum,
family=binomial(link='logit'))
summary(logit_model)
Call:
glm(formula = cbind(y, n - y) ~ storage,
family = binomial(link = "logit"), data = anthers.sum)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.2754 0.2632 -1.046 0.2955
storage 0.3985 0.1741 2.289 0.0221 *
Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 5.2790e+00 on 1 degrees of freedom
Residual deviance: -3.7748e-14 on 0 degrees of freedom
AIC: 16.079
Number of Fisher Scoring iterations: 2
How to manually calculate the Std. Error of the estimates of this logit model?
I tried Newton method:
the log-likelihood function is $$l=\sum_{i=1}^{N}\left[y_i(\beta_1+\beta_2x_i)-n_i\log\left[1+\exp(\beta_1+\beta_2x_i)\right]+\log{n_i \choose y_i}\right]$$ the score matrix is $$U=\left[\begin{matrix} \frac{\partial l}{\partial\beta_1} \\ \frac{\partial l}{\partial\beta_2} \end{matrix}\right]=\left[\begin{matrix} \sum(y_i-n_i\pi_i) \\ \sum x_i(y_i-n_i\pi_i) \end{matrix}\right]$$
$$\pi_i=\frac{\exp(\beta_1+\beta_2x_i)}{1+\exp(\beta_1+\beta_2x_i)}$$
the information matrix is
$$J=\left[\begin{matrix} \sum n_i\pi_i(1-\pi_i) & \sum n_ix_i\pi_i(1-\pi_i) \\ \sum n_ix_i\pi_i(1-\pi_i) & \sum n_ix_i^2\pi_i(1-\pi_i) \end{matrix}\right]$$
For a log-likelihood function of a single parameter $\beta$ , the first three terms of the Taylor series approximation near an estimate $b$ are
$$\begin{align}l(\beta)&\approx l(b)+(\beta-b)U(b)+\frac{1}{2}(\beta-b)^2U'(b)\\ &=l(b)+(\beta-b)U(b)-\frac{1}{2}(\beta-b)^2J(b)\\ &=-\frac{1}{2}(\beta-b)^2J(b) \end{align}$$ when $U(b)$ is the maximum likelihood estimate, $U(b)=0$ Then $$l(\beta)-l(b)=-\frac{1}{2}(\beta-b)^2J(b)$$
$$\mathbb E[(b-\beta)(b-\beta)^T]=J^{-1}\mathbb E(UU^T)J^{-1}=J^{-1}$$ because $$J=\mathbb E(UU^T)$$ then $$2[l(\beta)-l(b)]=(b-\beta)^2J(b)\sim \chi^2(p)$$
For the one-parameter case, $$b\sim N(\beta, J^{-1})$$
The iteration is :
$$J^{(m-1)} b^m=J^{(m-1)} b^{(m-1)}+U^{(m-1)}$$
X <- matrix(c(1,1,1,2), ncol=2,
byrow = TRUE)
B=matrix(c(0, 0), ncol = 1, byrow = FALSE)
for(i in seq(1:6)){
L=X %*% B
pii = exp(L)/(1+exp(L))
npi1_pi = n*pii*(1-pii)
U=matrix(c(sum(y-n*pii),
sum((y-n*pii)*x)),
ncol = 1, byrow = FALSE)
J=matrix(c(sum(npi1_pi),
sum(npi1_pi*x),
sum(npi1_pi*x),
sum(npi1_pi*x*x)),
ncol=2,
byrow = TRUE)
J_1 = solve(J)
B=J_1 %*% (crossprod(J, B)+U)
print(paste0("iter: ", i))
print("coefficients:")
print(B)
print("variances:")
print(diag(J_1))
print("std.error:")
print(sqrt(diag(J_1)))
writeLines("\n")
}
[1] "iter: 1"
[1] "coefficients:"
[,1]
[1,] -0.2641668
[2,] 0.3871441
[1] "variances:"
[1] 0.06797427 0.02913932
[1] "std.error:"
[1] 0.2607188 0.1707024
[1] "iter: 2"
[1] "coefficients:"
[,1]
[1,] -0.2753545
[2,] 0.3984872
[1] "variances:"
[1] 0.06924687 0.03026490
[1] "std.error:"
[1] 0.2631480 0.1739681
[1] "iter: 3"
[1] "coefficients:"
[,1]
[1,] -0.2753712
[2,] 0.3985039
[1] "variances:"
[1] 0.06929756 0.03031522
[1] "std.error:"
[1] 0.2632443 0.1741127
[1] "iter: 4"
[1] "coefficients:"
[,1]
[1,] -0.2753712
[2,] 0.3985039
[1] "variances:"
[1] 0.06929763 0.03031529
[1] "std.error:"
[1] 0.2632444 0.1741129
[1] "iter: 5"
[1] "coefficients:"
[,1]
[1,] -0.2753712
[2,] 0.3985039
[1] "variances:"
[1] 0.06929763 0.03031529
[1] "std.error:"
[1] 0.2632444 0.1741129
[1] "iter: 6"
[1] "coefficients:"
[,1]
[1,] -0.2753712
[2,] 0.3985039
[1] "variances:"
[1] 0.06929763 0.03031529
[1] "std.error:"
[1] 0.2632444 0.1741129
The odds ratios:
x <- anthers.sum$storage
#Logit model odds ratios Constant
exp(coefficients(summary(logit_model))["(Intercept)",1]+coefficients(summary(logit_model))["storage",1]*x[1])
#Logit model odds ratios Treatment vs. control
exp(coefficients(summary(logit_model))["storage",1]*(x[2]-x[1]))
[1] 1.131034
[1] 1.489594
questionr::odds.ratio(logit_model, level=0.95)
Waiting for profiling to be done...
OR 2.5 % 97.5 % p
(Intercept) 0.75929 0.45311 1.2725 0.29553
storage 1.48959 1.06015 2.0989 0.02209 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
When I calculate the 95% confidence interval of the odds ratios:
#95% CI of constant
exp(coefficients(summary(logit_model))["(Intercept)",1]+qnorm(c(0.025,0.975))*coefficients(summary(logit_model))["(Intercept)",2])
#95% CI of treatment vs constant
exp(coefficients(summary(logit_model))["storage",1]+qnorm(c(0.025,0.975))*coefficients(summary(logit_model))["storage",2])
[1] 0.4532458 1.2719847
[1] 1.058919 2.095430
They can't match the results of questionr::odds.ratio
questionr::odds.ratiogives you profile likelihood based confidence intervals, see the tag [tag:profile-likelihood]. Also, please try to write your R formulas in a humanly readable form! – kjetil b halvorsen Dec 20 '23 at 17:26