How can I specify nested and crossed random effects with lme4

Question

I'm trying to model regional implicit bias using multilevel regression with poststratification as described in Hoover, J., & Dehghani, M. (2019). The big, the bad, and the ugly: Geographic estimation with flawed psychological data. Psychological Methods. Advance online publication. https://doi.org/10.1037/met0000240.

The authors model Implicit Bias (imp_bias) as a function of six fixed effect county-level (contextual) variables (vote_prop.std + ... + latino_prop.std) and various random effects. Three of these random effects are geographical units: Counties nested within states nested within divisions. Though they are not explicitly specified as nested factors (as (1 | division/state_fips/county_fips) would do), they should be handled as nested factors because every fips-code is unique to the specific geographical unit. The other three are demographic variables: education, sex, & age. In the paper the authors write that the demographic effects were estimated as random intercepts crossed with division and that they did cross them with division because otherwise the model didn't converge.

I am confused as to whether the code they report does achieve this:

iat.imp.mrp.2 <- lmer(imp_bias ~ 1 +
                            vote_prop.std + 
                            ...
                            latino_prop.std +
                   (1 | county_fips) + 
                   (1 | state_fips) +
                   (1 | division:educ_3lvl) +
                   (1 | division:sex:age_4lvl),
                   data = iat.estimation.df, verbose=T, 
                   control=lmerControl(optimizer="bobyqa",
                                 optCtrl=list(maxfun=2e5)))

To my understanding (1 | division:educ_3lvl) indicates that education is nested within division rather than crossed with division. And how does this model differ from the following?

iat.imp.mrp.2 <- lmer(imp_bias ~ 1 +
                            vote_prop.std + 
                            ...
                            latino_prop.std +
                   (1 | county_fips) + 
                   (1 | state_fips) +
                   (1 | division) +
                   (1 | educ_3lvl) +
                   (1 | sex) +
                   (1 | age_4lvl),
                   data = iat.estimation.df, verbose=T, 
                   control=lmerControl(optimizer="bobyqa",
                                 optCtrl=list(maxfun=2e5)))

To summarize, I don't really understand what the : does in the authors' code and would like to know how to specify with which of the three levels the demographic variables are crossed. Any help is really much appreciated!

In the random effects specification of lmer models, the : asks for the random effect to be estimated across all levels of two (or more) factors. The authors might have used this because they did not have enough levels in one factor and when they combined it with another sensible factor, it gave them more levels. — Erik Ruzek, Mar 13 '20 at 20:13

score 3 · Answer 1 · answered Mar 15 '20 at 07:54

In your first example, you have a nested design, i.e. educ_3lvl is nested within division, as you already wrote. In the second example, you have a cross-classified (or fully crossed), not nested, design.

However, a nested model would usually be denoted as (1 | division/educ_3lvl), which expands to (1 | division:educ_3lvl) + (1 | division). So in your example, the (1 | division) is missing. I'm not sure what it means when you only have (1 | division:educ_3lvl) in your formula?

For MRP, it is not uncommon to model a crossed design, and even including classical "fixed" effects as random effects, because you need these information for poststratification. There is a nice example here (and the authors used Stan also to avoid convergence issues).

See some short explanation with further links here.

How can I specify nested and crossed random effects with lme4

1 Answers1