1

I am running a model testing the interaction of two categorical variables (AKA factors), one with two and the other with three groups. I have done many readings but couldn't find an answer regarding how to interpret the p-values I get, especially considering that what I understand does not make any sense when I look at the prediction plot.

Here is the model and the summary table:

m1 = glmmTMB(A ~ B*C + w1 + w2 + w3 + (1|individual), data = dt)
summary(m1)
        Estimate Std. Error z value Pr(>|z|)    

(Intercept) 242.192 8.229 29.43 < 2e-16 *** Bb2 38.825 10.558 3.68 0.000236 *** Cc2 62.400 13.640 4.57 4.77e-06 *** Cc3 48.960 12.452 3.93 8.42e-05 *** w1 87.620 2.665 32.87 < 2e-16 *** w2 -6.441 2.666 -2.42 0.015702 *
w3 8.212 2.725 3.01 0.002582 ** Bb2:Cc2 -40.636 16.525 -2.46 0.013928 *
Bb2:Cc3 -20.067 14.963 -1.34 0.179880

And the prediction plot (with CI):

enter image description here

Relating to the results above, my questions are:

  1. What does the significance of the interaction terms mean? My intercept is b1c1, but does that mean that Bb2:Cc2 is significantly different from Bb1:Cc1, and that Bb2:Cc3 isn't significantly different from Bb1:Cc1? Looking at the prediction plot, this doesn't make any sense.
  2. How can I get all comparisons? Specifically, how do I get the p-value for the comparison between b1:c1 and b1:c2 and between b1:c1 and b1:c3?
  3. How do I report these results? Should I report the results for the interaction using an Anova test, and if so, which one should I use: in R - aov, and car::Anova which have two options (type II or type III) all give different results.

** In case it is important, here is the code I use to produce the prediction:

newdt = expand.grid(C = c("c1","c2","c3"),
                B = c("b1", "b2"),
                w1 = mean(dt$w1),
                w2 = mean(dt$w2),
                w3 = mean(dt$w3),
                individual = 0)

pred.CapWild.MigNo.Beeline = as.data.frame(predict(m1, newdata = newdt, re.form = NA, se.fit = T, allow.new.levels=TRUE, type = "response")) %>% mutate(CI = 1.96*(se.fit), CI.u = fit+CI, CI.l = fit-CI) %>% cbind(.,newdt)

utobi
  • 11,726

1 Answers1

0

Assuming that you use the default contr.treatment coding for the categorical variables:

  • The estimated model coefficient labeled with Cc2 is a covariate-adjusted estimate of $-\mu_{b_1c_1} + \mu_{b_1c_2}$, with corresponding null hypothesis $-\mu_{b_1c_1} + \mu_{b_1c_2}=0 \iff \mu_{b_1c_1} = \mu_{b_1c_2}$.
  • The estimated model coefficient labeled with Cc3 is a covariate-adjusted estimate of $-\mu_{b_1c_1} + \mu_{b_1c_3}$, with corresponding null hypothesis $-\mu_{b_1c_1} + \mu_{b_1c_3}=0 \iff \mu_{b_1c_1} = \mu_{b_1c_3}$.
  • The estimated model coefficient labeled with Bb2:Cc2 is a covariate-adjusted estimate of $\mu_{b_1c_1} - \mu_{b_2c_1} - \mu_{b_1c_2} + \mu_{b_2c_2}$, with corresponding null hypothesis $\mu_{b_1c_1} - \mu_{b_2c_1} - \mu_{b_1c_2} + \mu_{b_2c_2}=0 \iff \mu_{b_1c_1} - \mu_{b_2c_1} =\mu_{b_1c_2} - \mu_{b_2c_2}$.
  • The estimated model coefficient labeled with Bb2:Cc3 is a covariate-adjusted estimate of $\mu_{b_1c_1} - \mu_{b_2c_1} - \mu_{b_1c_3} + \mu_{b_2c_3}$, with corresponding null hypothesis $\mu_{b_1c_1} - \mu_{b_2c_1} - \mu_{b_1c_3} + \mu_{b_2c_3}=0 \iff \mu_{b_1c_1} - \mu_{b_2c_1} =\mu_{b_1c_3} - \mu_{b_2c_3}$.
d <- expand.grid(B = paste0("b", 1:2), C = paste0("c", 1:3))
model_mat <- model.matrix(~ B * C, d)
rownames(model_mat) <- d$B:d$C
solve(model_mat)
#             b1:c1 b2:c1 b1:c2 b2:c2 b1:c3 b2:c3
# (Intercept)     1     0     0     0     0     0
# Bb2            -1     1     0     0     0     0
# Cc2            -1     0     1     0     0     0
# Cc3            -1     0     0     0     1     0
# Bb2:Cc2         1    -1    -1     1     0     0
# Bb2:Cc3         1    -1     0     0    -1     1

One way to get all pairwise comparisons is emmeans::emmeans(m1, ~ B * C, contr = "pairwise").

Whether you should report the results of a joint test of the interaction contrasts or of individual tests depends on the hypotheses you are examining.
For the type I vs. II vs. III debate see, e.g., this and this answer.

statmerkur
  • 5,950
  • That's amazing, thanks! I'll try my luck without opening a new question: let's say that I now use a three-way interaction, with one of the continuous weather variables in that interaction (e.g., A ~ BCw1 + w2 + w3 + (1|individual)). Now the prediction plot with w1 on the x axis has 4 lines: one for each B*C combination. Is there a way to know which of these lines (I'd say slopes, but they are not linear) is significantly different from the other lines? – Ron Efrat Nov 30 '22 at 10:22
  • @RonEfrat Wouldn't you get 6 lines? And why are they not linear in the model you specified? In any case, I think emmeans::emtrends would provide what you are looking for (see, e.g., here). – statmerkur Nov 30 '22 at 23:44
  • You are right, 6, not 4 lines, it was a typo. And I am not sure why there are not linear, I was also surprised, but that's for another topic.. Thanks again – Ron Efrat Dec 01 '22 at 13:37