How do you determine what interaction terms to include in your linear mixed effects model?

Question

I am currently trying to compute a linear mixed effects model and am unsure about which interaction terms to include or not include.

For example, I have the following model, where I have included species richness as the response variable (richness; specified per time point and mesocosm), the sampling method (data.type), sampling time point (time.point), mesocosm pH (pH), and their interactions as fixed effects, and individual mesocosms (mesocosm) as a random effect.

model <- lmer(richness ~ 0 + data.type + time.point + pH + data.type:time.point + pH:time.point + pH:data.type + pH:time.point:data.type + (1|mesocosm), data)

I then computed this model, and according to the Pr(>|t|) results generated, should I then elect to only include model interactions that are significant (i.e., $p < 0.05$)? Would this be the correct way to go about things?

Shawn Hemelstrand · Answer 1 · 2024-03-09T16:58:58.137

3

First and foremost, your regression is a mathematical model which seeks to validate to some degree a theoretical model. So the core part of fitting an interaction is not the result of such an interaction, but more the rationale for why you included it in the first place. Interactions can be a complicated thing, so it is important to consider why an interaction should be present in the first place.

Given this, one particular point about this question sticks out: why do you believe that there is a relationship between pH, time, and sampling method? I would think that this would not be a strong interaction, particularly being a three-way interaction that doesn't scream that there is a giant association present. But perhaps I am missing some pieces of the puzzle here.

To be clear, you shouldn't pick anything based off statistical significance alone. I very recently explained here why this is a problematic practice. This is especially the case given the controversies of $p$ values in mixed models (such as which $p$ values to use and why, and see here for simulations that show they are anti-conservative in general and some of the rationale behind different $p$ values in mixed models).

edited Mar 09 '24 at 16:58

answered Mar 09 '24 at 16:51

Shawn Hemelstrand

13,543

Thanks for your response! To elaborate on the study design (for context): I am looking at the number of species detected using different methods (data.type) in different pH conditions (pH) across time (time.point). In this case, then, would an "interaction" exist if I believe one variable to vary/effect another in some way? For example, pH should determine species detection, so I would expect pH and data.type to interact. – ramateur Mar 09 '24 at 18:07
1

That part I understand. My pressing question is what you think is generating an association between the three. Do you believe that the measurement methods differ over time? Does this difference influence the pH conditions? You need to have a reason for fitting this interaction that has some explanatory power...otherwise it should be omitted. – Shawn Hemelstrand Mar 09 '24 at 18:10
Yes, the result using the measurement methods do differ over time. They also differ depending on pH. In that case, I would include the interactions pH:data.type and data.type:time.point. However, the different pH conditions do not change over time, so I would no longer include pH:time.point. In this sense, I suppose an interaction between the three could also exist. – ramateur Mar 09 '24 at 18:14
1

To be clear, I meant your a priori hypotheses about these effects, not what the data says after sampling. Removing effects after testing in general isn't a great idea unless there is a very good statistical reason for doing so. – Shawn Hemelstrand Mar 09 '24 at 18:16
I am slightly confused. Here, data.type refers to different methods of species detection (e.g. molecular versus traditional methods). Is it valid to say for example that the data type is influenced by the pH? I am asking because I am not sure if the interaction here would refer only to the data type specifically (molecular or traditional), OR what is detected USING each data type (e.g., the richness). If the latter, then the pH would indeed affect what is detected, and I would keep the interaction term. But is this what the interaction is referring to? – ramateur Mar 09 '24 at 19:32
Is it valid to say that the data type is influenced by the pH? I have no idea how your field works, so I can't say with certainty if there is a relationship at all. You have to use your own subject matter expertise to answer that question. The interaction simply tests if the relationship between the IV and DV varies considerably based off another covariate (above and beyond the main effects by themselves). A visual example can be found here. – Shawn Hemelstrand Mar 09 '24 at 19:35

score 1 · Answer 2 · answered Mar 09 '24 at 16:16

You should have reasons for including interactions. Statistical significance is one reason (although I prefer looking at the effect size) but there are others. Maybe the interaction is part of your theory. Maybe it's part of other people's theories. Maybe it is of interest for other reasons.

How do you determine what interaction terms to include in your linear mixed effects model?

2 Answers2