3 Levels Linear Models in R with random slopes and intercepts with hierarchy level variables

Question

I am trying to write a 3 level multilevel linear model in R, using lme4.

I've decided to use the data from this question as a guide: Crossed vs nested random effects: how do they differ and how are they specified correctly in lme4?

library("lme4")
dt <-read.table("http://bayes.acs.unt.edu:8083/BayesContent/class/Jon/R_SC/Module9/lmm.data.txt",
                 header=TRUE, sep=",", na.strings="NA", dec=".", strip.white=TRUE)
> head(dt)
  id    extro     open    agree    social class school
1  1 63.69356 43.43306 38.02668  75.05811     d     IV
2  2 69.48244 46.86979 31.48957  98.12560     a     VI
3  3 79.74006 32.27013 40.20866 116.33897     d     VI
4  4 62.96674 44.40790 30.50866  90.46888     c     IV
5  5 64.24582 36.86337 37.43949  98.51873     d     IV
6  6 50.97107 46.25627 38.83196  75.21992     d      I

But I would also like to introduce School-level variables.

dfSchool = data.frame(School = sort(unique(dt$school)),SchoolCat1.F = factor(c("S1","S2","S3","S2","S3","S1")),SchoolVar = (rnorm(6)))
> dfSchool
  School SchoolCat1.F   SchoolVar
1      I           S1 -0.08933959
2     II           S2  0.72704675
3    III           S3 -2.42612923
4     IV           S2 -0.78280022
5      V           S3 -0.23886568
6     VI           S1 -1.53925113

Using the data above as an example, and defining y:= extro, x1:= open, x2:= agree, x:=social, j:= class level, k:= school level, x4:= SchoolCat1.F, x5:= SchoolVar.

I want to write the following model:

how would it be represented?

I believem1 <- lmer(extro ~ open + agree + social + (1 | school/class), data = dt) means that only the intercept is varying.

Therefore m2 <- lmer(extro ~ (open|school/class) + (agree|school/class) + (social|school/class) + (1 | school/class), data = dt) would be a good start.

However, I do not know how to include School-level variables.

score 1 · Accepted Answer · answered Dec 10 '22 at 20:09

The model you propose has random slopes at the classroom and school levels for all student-level predictors. This may be a valid model for some data, but it would not be advisable for this particular data for reasons I outline below. That said, if you wanted to estimate the model that you showed, you would need to merge in the school-level variable (SchoolVar) that you created into the dt data frame:

library("lme4")
library("dplyr")
dt <-read.table("http://bayes.acs.unt.edu:8083/BayesContent/class/Jon/R_SC/Module9/lmm.data.txt",
                header=TRUE, sep=",", na.strings="NA", dec=".", strip.white=TRUE) 
dt$classID <- paste(dt$school, dt$class, sep=".")
dfSchool = data.frame(school = sort(unique(dt$school)),SchoolCat1.F = factor(c("S1","S2","S3","S2","S3","S1")),SchoolVar = (rnorm(6)))
dtS <- merge(dt,dfSchool,by="school")

Then, you can estimate the maximal model with the SchoolVar predictor as follows:

m3 <- lmer(extro ~  1 + open + agree + social + SchoolVar + (1 + open + agree + social|classID) + 
               (1 + open + agree + social|school), data = dtS)

If this were a real analysis, you would not treat school as a random intercept given that there are only four schools in the data. Instead you would likely treat it as a fixed intercept, however doing so would not allow you to estimate the slope for SchoolVar because the latter is collinear with as.factor(school):

m4 <- lmer(extro ~  1 + open + agree + social + as.factor(school) + (1 |classID), data = dtS)

Note that instead of as.factor(school), you could use the SchoolCat1.F variable you created in the dfSchool data frame. But it makes little to no sense to include that variable in m3, above, because you already have random intercepts for schools. See the answer to this recent question.

Thank you. That makes sense, I did not realize I could just merge to the data since it only varies when school changes.
My data is different, just a bit hard to share, so I used this one instead. I am trying to predict laptime in a Formula 1 race, my first level is lap, second driver, and third circuit.

So, I noticed you did not specify which level comes first with (1 | School/classID). Isn't it needed? — Luan Vieira, Dec 10 '22 at 22:17
I think your data is crossed because I assume that drivers appear on multiple circuits. If so, then you want the separate specification (1|driver) + (1|circuit). — Erik Ruzek, Dec 10 '22 at 23:01

3 Levels Linear Models in R with random slopes and intercepts with hierarchy level variables

1 Answers1