Visualizing the folly of fitting random slopes for variables that don't vary within groups

Question

Sometime I encounter wrong mixed-models that run without warning. By wrong, I mean logically nearly impossible. Think of a cross-level interaction that is set by the software syntax to AGAIN vary across the levels of a grouping variable.

Is there a visual (i.e., plot) to demonstrate the folly of fitting random slopes for variables that don't vary within grouping variables?

# R code for 2 wrongly defined mixed-models that run fine:
library(lme4)
hsb <- read.csv('https://raw.githubusercontent.com/rnorouzian/e/master/hsb.csv')
m1 <- lmer(math ~ ses+sector + (ses:sector | sch.id), data = hsb)
m2 <- lmer(math ~ ses +  (sector | sch.id), data = hsb)

Try plotting the raw data first for the non-varying variables by cluster. You will get a single line. That should be enough to prevent you from estimating a nonsensical model. — Erik Ruzek, Oct 07 '20 at 21:05
@ErikRuzek, you mean like `hsb10 <- subset(hsb, sch.id %in% unique(sch.id)[1:10]);
ggplot(hsb10) + aes(x=size, y = math) + geom_path()+ facet_wrap(~sch.id)` — rnorouzian, Oct 07 '20 at 21:44

Robert Long · Accepted Answer · 2020-10-08T18:54:04.483

I think it makes sense here to step back and simplify things. For the purpose of this answer we can think about this model:

Y ~ X + (X | G)

...in two scenarios: where X varies at the individual / unit level, and where X varies at the group level.

The motivation for fitting random slopes often arises out of the following. We have a study where we measures individuals, and we are interested in some fixed effect, ie slope of a variable. It could be the same variable measured over time, or it could be the response to different treatment levels of a variable, for example. If we had only one individual we would simply take measurements and think about a plot such as this:

set.seed(1)
X <- 1:20
Y <- 3 + X + rnorm(20, 0, 3)
ggplot(data.frame(Y, X), aes(y = Y, x = X)) + geom_point() + geom_smooth(method = 'lm', se = FALSE)

Our interest would then be in the slope of the fitted line, from the model:

> lm(Y ~ X) %>% coef()
(Intercept)           X 
   3.062716    1.067789

Now, when we have multiple individuals, we don't want to fit seperate models for each individual, as discussed here: Difference between t-test on betas from individual regressions vs linear mixed modeling

So we want random intercepts, where each individual will have the same fixed effect (slope) for X, but a different intercept. Moreover, we naturally would expect each individual to have their own slope, so we want random slopes for X:

set.seed(1)
n.group <- 10
dt <- expand.grid(G = 1:n.group, X = 1:20)
dt$Y = 1
X <- model.matrix(~ X, dt)
myFormula <- "Y ~ X + (X | G)"
foo <- lFormula(eval(myFormula), dt)
Z <- t(as.matrix(foo$reTrms$Zt))
betas <- c(3, 1)

b1 <- rnorm(n.group, 0, 3)   # random intercepts
b2 <- rnorm(n.group, 0, 0.5)   # random slopes
b <- c(rbind(b1, b2))
dt$Y <- X %% betas + Z %% b + rnorm(nrow(dt), 1)
dt$G <- as.factor(dt$G)
ggplot(dt, aes(y = Y, x = X, colour = G)) + geom_point() + geom_smooth(method = 'lm', formula= y ~ x, se = FALSE)

All is good. This is a classical plot to illlustrate random slopes and intercepts. Each line represents one individual / group and has it's own intercept and slope. Note that this is not plotted from the output of a mixed model, but rather from the data itself. We fit a mixed model in order to estimate the parameters, in the case of the random effects, the variance and covariance of the random intercepts and slopes.

Now, if we let X be a group-level predictor:

dt$X <- as.numeric(dt$G) / 4
X <- model.matrix(~ X, dt)
dt$Y <- X %% betas + Z %% b + rnorm(nrow(dt), 1)
ggplot(dt, aes(y = Y, x = X, colour = G)) + geom_point() + geom_smooth(method = 'lm', formula= y ~ x, se = FALSE)

We can immediately see that each group is a vertical accumulation of points for each X value. So there is no slope for each group / individual.

This is why it does not make sense to fit random slopes for a variable that only varies at the group level. If we try to fit a model with random slopes to such data, it will almost certainly not converge, or converge to a singular fit. I say almost certainly, because as noted in the OP, we do sometimes see such model that do converge. This is why it is necessary for analysts to think about what they are doing. Plotting the data is a very good first step in many analysis tasks and can help in avoiding mistakes, and generally guide the analysis in the right direction.

Thank you Rob! By the way, if m <- lmer(math ~ ses*sector + (ses:sector | sch.id), data = hsb) is a legitimate model? what is meaning of its random part? (Recall ses is a level-1 predictor and sector a level-2 predictor making their combinations a cross-level interaction) — rnorouzian, Oct 08 '20 at 14:06
Provided that the interaction varies by school, which it should since it includes a variable that varies by school, then I don't see a problem with that, other than, as I mentioned in another answer, allowing the interaction to vary by school, but not ses does not make much sense because they are linked (you can't change an interaction independenty of the main effects and vice versa), so if you insist on having random slopes for the interaction I think you should also include random slopes for ses. Of course whether the data supports either model, is another matter entirely. — Robert Long, Oct 08 '20 at 14:09
Thanks, Rob, my problem is conceptual. See, ses:sector is a cross-level interaction. Therefore, ses as a level-1 effect has to vary across schools but sector itself is a level-2 predictor and I can't understand how we can have sector vary as well?! In other words, a cross-level interaction cannot be taken to be random in a 2-level model. — rnorouzian, Oct 08 '20 at 14:16
... You say provided that ses:sector varies by school (I think it can't). Because in each school sector is a constant. So, it is not until we get the math ~ ses relation across all schools that we can be talking about their cross-level interaction. — rnorouzian, Oct 08 '20 at 14:19
I know. But the interaction is literally the multiplication of the two variables, so it too will vary by school. Say you have sch <- c(1,1,2,2) ; ses <- c(1,2,3,4) ; sec <- c(1,1,2,2), then the interaction ses:sec is literally ses*sec which will be 1 2 6 8, which varies by school. This isn't really part of the question and answer above so we really shouldn't be discussing it in these comments. Feel free to ask another question if it's not clear. — Robert Long, Oct 08 '20 at 14:26

Visualizing the folly of fitting random slopes for variables that don't vary within groups

1 Answers1

Linked