Interpret p-values for interactions of two categorical variables (and what Anova to use?)

Question

I am running a model testing the interaction of two categorical variables (AKA factors), one with two and the other with three groups. I have done many readings but couldn't find an answer regarding how to interpret the p-values I get, especially considering that what I understand does not make any sense when I look at the prediction plot.

Here is the model and the summary table:

m1 = glmmTMB(A ~ B*C + w1 + w2 + w3 + (1|individual), data = dt)
summary(m1)
        Estimate Std. Error z value Pr(&gt;|z|)    

(Intercept)  242.192      8.229   29.43  < 2e-16 ***
Bb2           38.825     10.558    3.68 0.000236 ***
Cc2           62.400     13.640    4.57 4.77e-06 ***
Cc3           48.960     12.452    3.93 8.42e-05 ***
w1            87.620      2.665   32.87  < 2e-16 ***
w2            -6.441      2.666   -2.42 0.015702 *

w3             8.212      2.725    3.01 0.002582 ** 
Bb2:Cc2      -40.636     16.525   -2.46 0.013928 *

Bb2:Cc3      -20.067     14.963   -1.34 0.179880

And the prediction plot (with CI):

Relating to the results above, my questions are:

What does the significance of the interaction terms mean? My intercept is b1c1, but does that mean that Bb2:Cc2 is significantly different from Bb1:Cc1, and that Bb2:Cc3 isn't significantly different from Bb1:Cc1? Looking at the prediction plot, this doesn't make any sense.
How can I get all comparisons? Specifically, how do I get the p-value for the comparison between b1:c1 and b1:c2 and between b1:c1 and b1:c3?
How do I report these results? Should I report the results for the interaction using an Anova test, and if so, which one should I use: in R - aov, and car::Anova which have two options (type II or type III) all give different results.

** In case it is important, here is the code I use to produce the prediction:

newdt = expand.grid(C = c("c1","c2","c3"),
                B = c("b1", "b2"),
                w1 = mean(dt$w1),
                w2 = mean(dt$w2),
                w3 = mean(dt$w3),
                individual = 0)
pred.CapWild.MigNo.Beeline = as.data.frame(predict(m1, newdata = newdt, 
                                               re.form = NA, se.fit = T, 
                                               allow.new.levels=TRUE, 
                                               type = "response")) %>% 
mutate(CI = 1.96*(se.fit), CI.u = fit+CI, CI.l = fit-CI) %>%
cbind(.,newdt)

It might help to read about contrasts and how to calculate them in R. emmeans is a popular package; see for example this vignette: Interaction analysis in emmeans. PS: Your intercept is not b1c1. It's B=b1, C=C1, w1=0, w2=0, w3=0. — dipetkov, Nov 13 '22 at 13:29
Do the test statistics for the interaction effect really differ between type-II and type-III tests? — statmerkur, Nov 14 '22 at 00:43
@statmerkur no, the test statistics for the interaction effect do not differ, but those for the variables on their own (without the interaction, i.e., B and C but not B:C) are different — Ron Efrat, Nov 16 '22 at 07:11

statmerkur · Accepted Answer · 2022-11-20T01:10:28.843

Assuming that you use the default contr.treatment coding for the categorical variables:

The estimated model coefficient labeled with Cc2 is a covariate-adjusted estimate of $-\mu_{b_1c_1} + \mu_{b_1c_2}$, with corresponding null hypothesis $-\mu_{b_1c_1} + \mu_{b_1c_2}=0 \iff \mu_{b_1c_1} = \mu_{b_1c_2}$.
The estimated model coefficient labeled with Cc3 is a covariate-adjusted estimate of $-\mu_{b_1c_1} + \mu_{b_1c_3}$, with corresponding null hypothesis $-\mu_{b_1c_1} + \mu_{b_1c_3}=0 \iff \mu_{b_1c_1} = \mu_{b_1c_3}$.
The estimated model coefficient labeled with Bb2:Cc2 is a covariate-adjusted estimate of $\mu_{b_1c_1} - \mu_{b_2c_1} - \mu_{b_1c_2} + \mu_{b_2c_2}$, with corresponding null hypothesis $\mu_{b_1c_1} - \mu_{b_2c_1} - \mu_{b_1c_2} + \mu_{b_2c_2}=0 \iff \mu_{b_1c_1} - \mu_{b_2c_1} =\mu_{b_1c_2} - \mu_{b_2c_2}$.
The estimated model coefficient labeled with Bb2:Cc3 is a covariate-adjusted estimate of $\mu_{b_1c_1} - \mu_{b_2c_1} - \mu_{b_1c_3} + \mu_{b_2c_3}$, with corresponding null hypothesis $\mu_{b_1c_1} - \mu_{b_2c_1} - \mu_{b_1c_3} + \mu_{b_2c_3}=0 \iff \mu_{b_1c_1} - \mu_{b_2c_1} =\mu_{b_1c_3} - \mu_{b_2c_3}$.

d <- expand.grid(B = paste0("b", 1:2), C = paste0("c", 1:3))
model_mat <- model.matrix(~ B * C, d)
rownames(model_mat) <- d$B:d$C
solve(model_mat)
#             b1:c1 b2:c1 b1:c2 b2:c2 b1:c3 b2:c3
# (Intercept)     1     0     0     0     0     0
# Bb2            -1     1     0     0     0     0
# Cc2            -1     0     1     0     0     0
# Cc3            -1     0     0     0     1     0
# Bb2:Cc2         1    -1    -1     1     0     0
# Bb2:Cc3         1    -1     0     0    -1     1

One way to get all pairwise comparisons is emmeans::emmeans(m1, ~ B * C, contr = "pairwise").

Whether you should report the results of a joint test of the interaction contrasts or of individual tests depends on the hypotheses you are examining.
For the type I vs. II vs. III debate see, e.g., this and this answer.

That's amazing, thanks! I'll try my luck without opening a new question: let's say that I now use a three-way interaction, with one of the continuous weather variables in that interaction (e.g., A ~ BCw1 + w2 + w3 + (1|individual)). Now the prediction plot with w1 on the x axis has 4 lines: one for each B*C combination. Is there a way to know which of these lines (I'd say slopes, but they are not linear) is significantly different from the other lines? — Ron Efrat, Nov 30 '22 at 10:22
@RonEfrat Wouldn't you get 6 lines? And why are they not linear in the model you specified? In any case, I think emmeans::emtrends would provide what you are looking for (see, e.g., here). — statmerkur, Nov 30 '22 at 23:44
You are right, 6, not 4 lines, it was a typo. And I am not sure why there are not linear, I was also surprised, but that's for another topic.. Thanks again — Ron Efrat, Dec 01 '22 at 13:37

Interpret p-values for interactions of two categorical variables (and what Anova to use?)

1 Answers1