Imagine I'm trying to model traffic stops with a logistic regression.
race = relevel(as.factor(c(rep("white", 3000), rep("black", 3000))), ref = "white")
income = relevel(as.factor(c(rep("low", 1500),rep("high",1500), rep("low", 2000),rep("high",1000))),ref = "high")
stop = c(rbinom(1500,1,0.5),rbinom(1500,1,0.4),rbinom(2000,1,0.8),rbinom(1000,1,0.5))
mod = glm(stop ~ race + income + race:income, family = "binomial")
summary(mod)
#Coefficients:
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 0.38600 0.01212 31.841 < 2e-16 ***
#raceblack 0.10800 0.01917 5.634 1.84e-08 ***
#incomelow 0.14400 0.01714 8.399 < 2e-16 ***
#raceblack:incomelow 0.14250 0.02499 5.702 1.24e-08 ***
#---
# Null deviance: 1469.3 on 5999 degrees of freedom
#Residual deviance: 1321.8 on 5996 degrees of freedom
#AIC: 7960.5
Further imagine I wanted to report the marginal effect of being a low income Black individual with regards to probability of being stopped, something like "compared to a high income white individual, a low income Black individual has a 1.29 OR of being stopped."
If there were no interaction coefficients in the models, I could just add the coefficients:
sum(coef(model)[c("raceblack",'incomelow')])
#[1] 0.252
I also learned from this Q&A that I can use general linear hypothesis testing.
library(multcomp)
glht(model, linfct = "raceblack + incomelow = 0")
# General Linear Hypotheses
#Linear Hypotheses:
# Estimate
#raceblack + incomelow == 0 0.252
My question is, since the model includes an interaction coefficient, should I also include the interaction coefficient when calculating the marginal effect?
sum(coef(model)[c("raceblack",'incomelow','raceblack:incomelow')])
#[1] 0.3945
glht(model, linfct = "raceblack + incomelow + raceblack:incomelow = 0")
# General Linear Hypotheses
#Linear Hypotheses:
# Estimate
#raceblack + incomelow + raceblack:incomelow == 0 0.3945