Standard errors and p-values for interaction effects with GLM

Question

I’m wondering how to calculate the standard errors and p-values of interaction effects in logistic models.

For illustration, consider this toy example, where we think some treatment effect will differ between men and women (but not by education):

set.seed(101)
n <- 2000
dv <- sample(0:1,n,rep=TRUE)
condition <- sample(c("control","treat"),n,rep=TRUE)
gender <- sample(c("male","female"),n,rep=TRUE)
education <- sample(c("less_than_undergrad","undergrad","post_grad"),n,rep=TRUE)
m <- glm(dv ~ condition +
             gender +
             education +
             condition:gender,
           family = "binomial")

Which gives us this:

Coefficients:
                          Estimate Std. Error z value Pr(>|z|)  
(Intercept)               -0.07849    0.11065  -0.709   0.4781  
conditiontreat             0.22031    0.12658   1.740   0.0818 .
gendermale                -0.04788    0.12558  -0.381   0.7030  
educationpost_grad         0.02815    0.10965   0.257   0.7974  
educationundergrad         0.04922    0.10981   0.448   0.6540  
conditiontreat:gendermale -0.13205    0.17921  -0.737   0.4612  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Here, the treatment effect for female (the reference category) is estimated at 0.22031, with an SE of 0.12658 and a p-value of 0.0818.

The treatment effect for male, by contrast, would be

beta <- 0.22031 - 0.13205 # 0.08826

But what is the SE and p-value for that estimate? I believe it's not the ones given on the final row of the output, since they speak to the difference in effect between male and female.

1. Calculating the SE
From this and this, I gather that the SE should instead be calculated as follows:

$\sqrt{var(\hat{\beta_1}) + var(\hat{\beta_5}) + 2*cov(\hat{\beta_1}\hat{\beta_5})}$

Applied to the case above, that would give us the following

beta_se <- sqrt(vcov(m)[2,2] + vcov(m)[6,6] + 2*vcov(m)[2,6]) # 0.1268722

2. Calculating the p-value
Once I have the SE, I gather from this that I would then simply calculate the p-value like this:

pnorm(-abs(beta)/beta_se) * 2 # 0.4866415

Have I understood all of this correctly?

EDIT: Full code in one block:

set.seed(101)
n <- 2000
dv <- sample(0:1,n,rep=TRUE)
condition <- sample(c("control","treat"),n,rep=TRUE)
gender <- sample(c("male","female"),n,rep=TRUE)
education <- sample(c("less_than_undergrad","undergrad","post_grad"),n,rep=TRUE)
m <- glm(dv ~ condition +
           gender +
           education +
           condition:gender,
         family = "binomial")
summary(m)
beta <- 0.22031 - 0.13205 # 0.08826
beta_se <- sqrt(vcov(m)[2,2] + vcov(m)[6,6] + 2vcov(m)[2,6]) # 0.1268722
pnorm(-abs(beta)/beta_se)  2 # 0.4866415
library(multcomp)
summary(glht(m, linfct = c("conditiontreat + conditiontreat:gendermale = 0")))

Which gives:

     Simultaneous Tests for General Linear Hypotheses
Fit: glm(formula = dv ~ condition + gender + education + condition:gender, 
    family = "binomial")
Linear Hypotheses:
                                                Estimate Std. Error z value Pr(>|z|)
conditiontreat + conditiontreat:gendermale == 0  0.08825    0.12687   0.696    0.487
(Adjusted p values reported -- single-step method)

dimitriy · Accepted Answer · 2020-10-02T08:14:39.093

2

I cannot replicate your estimates with the R code provided, but you can verify your manual calculations by testing the null that linear combination of two coefficients is zero:

set.seed(101)
n <- 2000
dv <- sample(0:1,n,rep=TRUE)
condition <- sample(c("control","treat"),n,rep=TRUE)
gender <- sample(c("male","female"),n,rep=TRUE)
education <- sample(c("less_than_undergrad","undergrad","post_grad"),n,rep=TRUE)
m <- glm(dv ~ condition +
           gender +
           education +
           condition:gender,
           family = "binomial")
write.dta(data, "~/Desktop/data.dta")
summary(m)
names(coef(m)) 
library(multcomp)
summary(glht(m, linfct = c("conditiontreat + conditiontreat:gendermale = 0")))

The last part yields:

Simultaneous Tests for General Linear Hypotheses
Fit: glm(formula = dv ~ condition + gender + education + condition:gender, 
    family = "binomial")
Linear Hypotheses:
                                                Estimate Std. Error z value Pr(>|z|)
conditiontreat + conditiontreat:gendermale == 0  -0.1838     0.1290  -1.425    0.154
(Adjusted p values reported -- single-step method)

This shows that while the effect of treatment on the log odds is negative for males, it is not statistically distinguishable from zero.

Here's the same calculation from first principles:

> (te <- m$coefficients["conditiontreat"] + m$coefficients["conditiontreat:gendermale"])        
conditiontreat 
    -0.1838191 
> (se_te <- sqrt(vcov(m)["conditiontreat","conditiontreat"] 
+               + vcov(m)["conditiontreat:gendermale","conditiontreat:gendermale"]
+               + 2*vcov(m)["conditiontreat","conditiontreat:gendermale"]))
[1] 0.1290362
> (pval <- pnorm(te/se_te)*2)
conditiontreat 
      0.154286

edited Oct 02 '20 at 08:14

answered Oct 02 '20 at 01:49

dimitriy

35,430

Thank you, Dimitriy! Three things:
(1) Apologies! There was a typo in the p-value calculation. The code should now replicate.

(2) When I run summary(glht(m, linfct = c("conditiontreat + conditiontreat:gendermale = 0"))) I don’t get an estimate of -0.1838. Instead I get the numbers I have in my manual calculations: an estimate of 0.08825 with an SE of 0.12687 and p-value of 0.487.

(3) Are you sure it should be -2*Cov()? From the Wikipedia page it looks like it should be +2 for Var(aX + bY). And the +2 calculation seems to yield the ‘correct’ SE, as given by (2) above.
– kh_one Oct 02 '20 at 07:31
I don't see any edits to the data simulation part, so I still can't replicate your estimates/coefficients. The change to the manual p-value calculation does not alter that fact and our numbers won't line up without that fix. In your case, $b<1$, so the $Var(aX-bY)$ formula applies. Could certainly be that we are missing something. – dimitriy Oct 02 '20 at 07:47
Can't figure out why it's not replicating. Have added full code block in original post, which should replicate on copy+paste, and yield a positive estimate. – kh_one Oct 02 '20 at 07:53
1

I think you are right about the plus. I made a silly algebra mistake and have edited that out. I still can't get your numbers, but the calculations also work out for me. See the edit above. Not sure what's going wrong with R, but I do get the same estimates in Stata as in R on my end. – dimitriy Oct 02 '20 at 08:18
1

Thanks, Dimitriy. Very strange that we're not getting the same numbers. The only difference is that I'm not running write.dta(data, "~/Desktop/data.dta"). But at least we're both getting the glht and manual method to give the same results on our respective ends, which seems to confirm that the way I was calculating the SE and p-value in the original post seems to have been the correct way of doing it. – kh_one Oct 02 '20 at 08:54

k13 · Answer 2 · 2023-04-06T13:42:04.890

I know this question is over 2 years old, but I wanted to comment for anyone else who finds this post useful (as I did). The calculation of the treatment effect for male in this case should include not only the reference level for treatment, but also the effect of sex. In other words:

$$\beta_{male + treatment} = \beta_{male} + \beta_{treatment} + \beta_{male:treatment}$$ (in OP's original calculation, $\beta_{male}$ was missing).

Or, in terms of odds ratios:

$$OR_{male+treatment} = OR_{male}\times OR_{treatment}\times OR_{male:treatment}$$

This also affects the calculation of the SE, as you should include the variance and covariance of the interaction term in the equation, as below (for generic variables $X_1$, $X_2$, and their interaction term $X_{1:2}$):

$$SE(β_{1+2}) = \sqrt{var(β_1+β_2+β_{1:2})} = \sqrt{var(β_1) + var(β_2) + var(β_{1:2}) + 2[cov(β_1,β_2) + cov(β_1,β_{1:2}) + cov(β_2,β_{1:2})]}$$

In R:

beta_se <- sqrt(vcov(m)[1,1] + vcov(m)[2,2] + vcov(m)[3,3] + 2*(vcov(m)[1,2] + vcov(m)[1,3] + vcov(m)[2,3]))

Construction of 95% confidence intervals follows as:

$$\beta_{1+2} \pm 1.96\times SE $$

which can be converted to confidence intervals on the odds ratio by exponentiating them.

Standard errors and p-values for interaction effects with GLM

2 Answers2

Linked