I have read a lot about random intercepts vs. random slopes in linear mixed effects (LME) models, but I am confused on how to think about them when it comes to nested data. I have looked at other posts (e.g., see here), but have struggled to find an example that matches the sort of nesting present in my data. This is similar to a previous question I had here, but I wanted to make it a separate question since this is more focused on the nested aspect of the data.
For context, I have a study with 15 subjects. Each subject (on a different day) conducts a task with a different pair of shoes on. I have 5 different pairs of shoes (treated as a categorical variable A,B,C,D,E). They conduct the task 10 times in the morning and 10 times in the afternoon. This means I have a total of 15 subjects * 5 shoes * 2 times of day * 10 measurements/session = 1500 observations. I want to understand how the pair of shoes and time of day ultimately impact their performance on the task.
The data in this study is inherently nested at multiple levels: 1) shoe is nested within subject and 2) time of day is nested within shoe. From reading online, it seems like one common way to account for this nesting is to have a nested random intercept term in the LME model. In my case, I think that would look as follows:
Y ~ Shoe + TimeOfDay + (1|Subject/Shoe/TimeOfDay)
However, from my reading of random slopes, I am unclear whether this intercept only model would account for correlations between the different random effects variables. For example, I think it is reasonable to assume that effect that the shoe has on each subject might vary. Similar, the effect that time of day has on each subject might vary. Therefore, an alternative random slope model I have considered is as follows:
Y ~ Shoe + TimeOfDay + (Shoe + TimeOfDay | Subject)
One thing to note here is that both Shoe and TimeOfDay are categorical variables, which in itself confuses me about using a random slope.
My questions are as follows:
- What is fundamentally different between the first and second model in terms of how they approach the nested nature of the data?
- Is one generally preferred vs. the other? What factors would one think about to make this decision?
- Does anything change because
ShoeandTimeOfDayare both categorical variables? In most cases I have seen a random slope used for a continuous variable, so not sure if it still makes sense when they are categorical.