Linear mixed model: Include random effect for trial if most trial characteristics already included as fixed effect?

Question

I have data where 400 participants rated a set of 8 given scenarios on the scales "valence" and "emotional arousal". The scenarios were designed in a 2 (setting: Europe vs. Asia) x 2 (complexity: low vs. high) x 2 (density: low vs. high) design (experimental manipulation).

I would like to model the fixed effects of complexity and density on valence. Would it be wrong to include random intercepts for scenario if the only variance left in scenario is known to be due to the different levels in setting?

lmer(valence ~ complexity * density + (1|participant) + (1|scenario), data = df)

Should I instead include random intercepts for different levels of setting?

lmer(valence ~ complexity * density + (1|participant) + (1|setting), data = df)

This seems wrong to me since there are only two levels of setting (which are also part of an experimental manipulation - I somehow remember that such variables should not be chosen as levels in mixed models since they are not random). It also begs the question why I do not include setting as a further fixed effect, which would make random intercepts for scenario obsolete, but I do not want to include fixed effects for setting since this is not the focus of this analysis.

In my data, the AIC for the second version (1|setting) indicates worse fit in comparison to the inclusion of (1|scenario). However, the fixed effects of complexity and density completely disappear in this model (same estimate but high p-values). When I use (1|setting) or even no random intercept on scenario level at all, all p values are < 0001.

How do I specify random intercepts correctly in this case?

I have preregistered a model with only density and complexity as fixed effects - which makes sense since I am not interested in the effect of setting at all. I was only wondering whether I need to consider setting as a random effect. Basically: I want to know the average effect of density and complexity across both levels of setting. — mkks, Mar 01 '23 at 16:02
Preregistration is generally good but I wonder if it should necessarily limit you to a suboptimal model. Given your experimental design, it seems obvious to me that setting should be a fixed effect. But given that the dataset is balanced and you seem to be ignoring interactions, including it or not should not affect your inferences about the other two parameters. Personally, I would analyse it both ways and present it that way. But I'd also be concerned about interactions and would consider a multiverse analysis examining possible interactions. — mkt, Mar 01 '23 at 16:09
I agree, I will also try models including fixed effects for setting and its interactions. Do you agree then that scenario or setting should not play any role as random intercepts in these models (regardless of whether they include setting as a fixed effect or not)? — mkks, Mar 01 '23 at 16:14
Yes, I would not use any random effects other than participant. — mkt, Mar 01 '23 at 19:12
Thank you very much! Maybe just one further clarification: If I would have a formula such as this valence ~ emo_arousal + (1|participant) where the experimental manipulations are not part of the equation, would that require random effects for 'scenario' or should I reduce this also to random effects for participant? I have seen that people sometimes also include (1|time) which would be the equivalent to (1|scenario), right? Or is that something that is only done if there are repeated observations for the same trial? — mkks, Mar 02 '23 at 08:23
I think you're overcomplicating this and would suggest reading a bit more about random effects and mixed models. Common advice is to use a fixed effect if you have less than 6 levels per factor (sometimes you will hear higher numbers, like 15 or 20). For 2 levels, it doesn't make sense and it's not clear what you're trying to achieve. For 8 levels, it's borderline, but you're making some assumptions that may be a little unreasonable. Fixed effects would be entirely valid and reasonable, why not use them? — mkt, Mar 02 '23 at 08:35
I will stick to fixed effects for scenario then, thanks again :) — mkks, Mar 02 '23 at 08:36

mkt · Accepted Answer · 2023-03-06T13:59:56.810

To summarise my answer in the comments:

This a 2 x 2 x 2 factorial experiment with multiple measurements on the 400 participants. A logical place to start analysing this would be as a mixed model with all three factors (setting, complexity, and density) as fixed effects and participant as random intercept. From your code, it looks like you'd like to include all possible interactions. This may be reasonable, but I would also consider whether random slopes might be important here. It's unclear to me if 400 participants would be sufficient data to support that level of model complexity, but it might be.

Why not use random effects for setting or even all the different 'scenarios'? Common advice is that random effects don't work well when you have less than 6 levels per factor (sometimes you will hear higher numbers, like 15 or 20). For 8 levels (i.e. if you treat all scenarios as equally distinct instead of 2 x 2 x 2), it's borderline, but it smacks of try to fit a square peg into a round hole. The simpler option works, so I would stick with it.

The fact that you've preregistered a different model is a wrinkle, but I'd be inclined to fit the more appropriate model and defend the change from the preregistered plan. Or to do a multiverse analysis exploring the consequences of different model structures.

Useful reading:

https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#should-i-treat-factor-xxx-as-fixed-or-random

What is the minimum recommended number of groups for a random effects factor?

Linear mixed model: Include random effect for trial if most trial characteristics already included as fixed effect?

1 Answers1