I’m wondering how to calculate the standard errors and p-values of interaction effects in logistic models.
For illustration, consider this toy example, where we think some treatment effect will differ between men and women (but not by education):
set.seed(101)
n <- 2000
dv <- sample(0:1,n,rep=TRUE)
condition <- sample(c("control","treat"),n,rep=TRUE)
gender <- sample(c("male","female"),n,rep=TRUE)
education <- sample(c("less_than_undergrad","undergrad","post_grad"),n,rep=TRUE)
m <- glm(dv ~ condition +
gender +
education +
condition:gender,
family = "binomial")
Which gives us this:
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.07849 0.11065 -0.709 0.4781
conditiontreat 0.22031 0.12658 1.740 0.0818 .
gendermale -0.04788 0.12558 -0.381 0.7030
educationpost_grad 0.02815 0.10965 0.257 0.7974
educationundergrad 0.04922 0.10981 0.448 0.6540
conditiontreat:gendermale -0.13205 0.17921 -0.737 0.4612
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Here, the treatment effect for female (the reference category) is estimated at 0.22031, with an SE of 0.12658 and a p-value of 0.0818.
The treatment effect for male, by contrast, would be
beta <- 0.22031 - 0.13205 # 0.08826
But what is the SE and p-value for that estimate? I believe it's not the ones given on the final row of the output, since they speak to the difference in effect between male and female.
1. Calculating the SE
From this and this, I gather that the SE should instead be calculated as follows:
$\sqrt{var(\hat{\beta_1}) + var(\hat{\beta_5}) + 2*cov(\hat{\beta_1}\hat{\beta_5})}$
Applied to the case above, that would give us the following
beta_se <- sqrt(vcov(m)[2,2] + vcov(m)[6,6] + 2*vcov(m)[2,6]) # 0.1268722
2. Calculating the p-value
Once I have the SE, I gather from this that I would then simply calculate the p-value like this:
pnorm(-abs(beta)/beta_se) * 2 # 0.4866415
Have I understood all of this correctly?
EDIT: Full code in one block:
set.seed(101)
n <- 2000
dv <- sample(0:1,n,rep=TRUE)
condition <- sample(c("control","treat"),n,rep=TRUE)
gender <- sample(c("male","female"),n,rep=TRUE)
education <- sample(c("less_than_undergrad","undergrad","post_grad"),n,rep=TRUE)
m <- glm(dv ~ condition +
gender +
education +
condition:gender,
family = "binomial")
summary(m)
beta <- 0.22031 - 0.13205 # 0.08826
beta_se <- sqrt(vcov(m)[2,2] + vcov(m)[6,6] + 2vcov(m)[2,6]) # 0.1268722
pnorm(-abs(beta)/beta_se) 2 # 0.4866415
library(multcomp)
summary(glht(m, linfct = c("conditiontreat + conditiontreat:gendermale = 0")))
Which gives:
Simultaneous Tests for General Linear Hypotheses
Fit: glm(formula = dv ~ condition + gender + education + condition:gender,
family = "binomial")
Linear Hypotheses:
Estimate Std. Error z value Pr(>|z|)
conditiontreat + conditiontreat:gendermale == 0 0.08825 0.12687 0.696 0.487
(Adjusted p values reported -- single-step method)
(1) Apologies! There was a typo in the p-value calculation. The code should now replicate.
(2) When I run
summary(glht(m, linfct = c("conditiontreat + conditiontreat:gendermale = 0")))I don’t get an estimate of-0.1838. Instead I get the numbers I have in my manual calculations: an estimate of0.08825with an SE of0.12687and p-value of0.487.(3) Are you sure it should be -2*Cov()? From the Wikipedia page it looks like it should be +2 for Var(aX + bY). And the +2 calculation seems to yield the ‘correct’ SE, as given by (2) above.
– kh_one Oct 02 '20 at 07:31write.dta(data, "~/Desktop/data.dta"). But at least we're both getting theglhtand manual method to give the same results on our respective ends, which seems to confirm that the way I was calculating the SE and p-value in the original post seems to have been the correct way of doing it. – kh_one Oct 02 '20 at 08:54