Random Intercept vs. Random Slope Linear Mixed Effects Model with Nested Data

Question

I have read a lot about random intercepts vs. random slopes in linear mixed effects (LME) models, but I am confused on how to think about them when it comes to nested data. I have looked at other posts (e.g., see here), but have struggled to find an example that matches the sort of nesting present in my data. This is similar to a previous question I had here, but I wanted to make it a separate question since this is more focused on the nested aspect of the data.

For context, I have a study with 15 subjects. Each subject (on a different day) conducts a task with a different pair of shoes on. I have 5 different pairs of shoes (treated as a categorical variable A,B,C,D,E). They conduct the task 10 times in the morning and 10 times in the afternoon. This means I have a total of 15 subjects * 5 shoes * 2 times of day * 10 measurements/session = 1500 observations. I want to understand how the pair of shoes and time of day ultimately impact their performance on the task.

The data in this study is inherently nested at multiple levels: 1) shoe is nested within subject and 2) time of day is nested within shoe. From reading online, it seems like one common way to account for this nesting is to have a nested random intercept term in the LME model. In my case, I think that would look as follows:

Y ~ Shoe + TimeOfDay + (1|Subject/Shoe/TimeOfDay)

However, from my reading of random slopes, I am unclear whether this intercept only model would account for correlations between the different random effects variables. For example, I think it is reasonable to assume that effect that the shoe has on each subject might vary. Similar, the effect that time of day has on each subject might vary. Therefore, an alternative random slope model I have considered is as follows:

Y ~ Shoe + TimeOfDay + (Shoe + TimeOfDay | Subject)

One thing to note here is that both Shoe and TimeOfDay are categorical variables, which in itself confuses me about using a random slope.

My questions are as follows:

What is fundamentally different between the first and second model in terms of how they approach the nested nature of the data?
Is one generally preferred vs. the other? What factors would one think about to make this decision?
Does anything change because Shoe and TimeOfDay are both categorical variables? In most cases I have seen a random slope used for a continuous variable, so not sure if it still makes sense when they are categorical.

I agree with the posted answers that there doesn't appear to be much nesting here. This is a useful thread to look at about crossed vs. nested: https://stats.stackexchange.com/a/228814/121522 — mkt, Jan 25 '24 at 08:11

Sointu · Answer 1 · 2024-01-25T07:39:05.270

It actually sounds like all your random effects are crossed, not nested. All subjects go through all shoe types A-E, right? And all subjects are measured in the morning and afternoon? In that case the random effects of subject, shoe and time of day are independent of each other, and thus should be modeled as crossed. In a random intercept only model this would go

Y ~ (1|Subject) + (1|Shoe) + (1|TimeofDay)

Your first model will not work, because you are modeling Shoe and Time of Day both as categorical predictors and as random effect grouping variables. These terms use the same variance, so you are entering them twice. You need to decide which way you want to model them.
You can put in a random "slope" of a categorical predictor. What you get is categorical predictor contrast estimates for each subject (if subject is the random effect grouping factor). If your categorical predictor has many levels this can be difficult to estimate, but you can do it.
Your second model seems OK to me, but as mentioned, it may be difficult to estimate because of the many random terms.

Roland · Answer 2 · 2024-01-25T07:48:30.793

I would not model Shoe and TimeOfDay as random effects and I'm not convinced that the proposed nesting exists. If you have exactly 5 pairs of shoes, these would be crossed with the subjects, i.e., Y ~ TimeOfDay + (TimeOfDay | Subject) + (TimeOfDay | Shoe) could be reasonable.

Your second proposed model looks very reasonable and best suited to answer your research question because you are interested in the effect of specific shoes, which hints that they should be fixed effects. I would investigate interactions between Shoe and TimeOfDay. You may need to remove random slopes if you see convergence issues or singular fit warnings.

Does anything change because Shoe and TimeOfDay are both categorical variables? In most cases I have seen a random slope used for a continuous variable, so not sure if it still makes sense when they are categorical.

No, nothing changes (except you might have too many parameters to support the model with your data). R applies treatment contrasts. Study the design matrix: help("model.matrix") and help("model.matrix.merMod")

Random Intercept vs. Random Slope Linear Mixed Effects Model with Nested Data

2 Answers2