How to test for specific effects in a linear regression with two categorical covariates and their interaction?

Question

I have a data set with a continuous response variable and two categorical covariates. Let's imagine that I worked at an e-commerce company and was trying to regress the revenue we get from each user as a function of customer type ('A', 'B', 'C'), whether they were exposed to a specific treatment ('Exposed', 'Control'), and the interaction between those two variables.

I would fit the following model, in which a type A customer in the Control group would be used as the reference:

model <- lm(revenue ~ 1 + customer_type * treatment, data = users)

In math: $$ \operatorname{revenue}_i=\beta_0+\beta_1\operatorname{TypeB}_i+\beta_2\operatorname{TypeC}_i+\beta_3\operatorname{Exposed}_i+\beta_4(\operatorname{TypeB}\times\operatorname{Exposed})_i+\beta_5(\operatorname{TypeC}\times\operatorname{Exposed})_i $$

The question I'm trying to answer is: is the revenue of type C customers who are exposed to treatment different from the revenue of type C customers who are not exposed to treatment?

The way I tried to figure this out is by deriving what the models would be for the two types of users. That is, for a type C customer who is exposed to treatment, the model becomes

$$ \operatorname{revenue}=\beta_0+\beta_2+\beta_3+\beta_5 $$

and for a type C customer who is not exposed, the model is

$$ \operatorname{revenue}=\beta_0+\beta_2 $$

Thus, the difference between the two is not a single parameter but two ($\beta_3+\beta_5$). This means that there isn't a single t-statistic and associated p-value in the regression output that tests the question I have.

Intuitively, I think this must be possible and I probably have the answer right in front of me. What am I missing? Is the solution to use bespoke contrasts (if so, how would I do that in R?).

score 1 · Accepted Answer · answered Nov 02 '22 at 13:43

1

What you thus want is to test the hypothesis that the contrast $\beta_3 + \beta_5=0$. You take the ratio of $\beta_3 + \beta_5$ to the standard error of that sum to get the t-statistic for testing.

You get that standard error with the formula for the variance of a weighted sum of variables, which in this case is just the sum of the individual variances plus twice their covariance, then taking the square root. Those (co)variances can be found with vcov(model). This page has some examples.

It's good to know how these calculations are done, but if you are doing a lot of regression work you should learn how to let tools provided by packages like emmeans and car do the calculations for you.

answered Nov 02 '22 at 13:43

EdM

92,183
10
92
267

Thank you, @EdM! I have been able to do it in two ways: (1) deriving the standard error of the sum manually, and (2) using multcomp::glht. Would you mind editing your answer or sharing some resource so I can learn how to do this with emmeans? – Adrià Luz Nov 02 '22 at 17:17
@AdriàLuz there's a vignette on contrasts and comparisons in emmeans, and another vignette specifically about interactions. For the simple task you asked about, emmeans might not provide any advantage over the methods you used. It's very useful when you are looking at several comparisons from a model. It also can make a different type of comparison; see this page. – EdM Nov 02 '22 at 17:36
@AdriàLuz this UCLA web page might also be of interest for how to use emmeans to evaluate models with interactions. – EdM Nov 02 '22 at 17:45
Thanks a lot @EdM. I found this UCLA article an excellent read! – Adrià Luz Nov 02 '22 at 22:11

How to test for specific effects in a linear regression with two categorical covariates and their interaction?

1 Answers1