Parsimonious Mixed Models

Question

I recently read a paper on the trimming of random effect structure by Bates, Kliegle, Vasishth and Baayen (2015). My understanding is that the Parsimonious Mixed Model they proposed mainly follows the principle of progressively excluding random slopes that account for almost no amount of variability (i.e., proportion of variance is almost 0). I want to follow Parsimonious Mixed Model approach to prune my random effect structure in a paper that I am currently writing for publication, and want to summarise the principles that I use to trim the random effect structure in an accurate and precise manner. Any correction for my misunderstanding would be appreciated.

score 10 · Accepted Answer · answered Aug 11 '20 at 07:03

Yes, your understanding is correct, but it if probabily a good idea to understand that background of that paper. Following the publication of the "Keep it Maximal" paper by Barr etc al (2013), which is referenced substantially by Bates, practitioners were increasingly confronted with models that convereged with a singular fit, due to a hopelessly over-parameterised random effects structure. Just see the number of posts on here about singular fits as some evidence for that.

Bates et al (2015) were specifically attempting to address this problem and I wrote an answer based on their recommendations here:

How to simplify a singular random structure when reported correlations are not near +1/-1

However I don't think it is correct to say that Bates recommends starting with a maximal model and simplifying. This is the recommendation for the people who think a maximal model is a good idea in the first place. It clearly isn't when the number of estimated variance components becomes close to the number of observations, but it might be a good idea when this is not the case. For example in many observational studies it is perfectly reasonable to allow all the main exposure(s) to vary by subject. But the same can't as easily said for competing exposures and confounders. It might very well be the case that models with random slopes for these have a better fit to the data than ones without, but starting out with a fully maximal model and pruning it according to p-value thresholds of likelihood ratio tests, is in my opinion the wrong thing to do. I would start with a parsimonious model only including random slopes that I believe a priori should be allowed to vary by subject, based on domain knowledge and theory - and this would not normally include confounders and competing exposures. If that model had a singular fit then I would use the approach outlined in my answer above, but if it didn't then I would not seek to make the random structure any more complex.

References:

Bates, D., Kliegl, R., Vasishth, S. and Baayen, H., 2015. Parsimonious mixed models. arXiv preprint arXiv:1506.04967.
https://arxiv.org/pdf/1506.04967.pdf

Barr, D.J., Levy, R., Scheepers, C. and Tily, H.J., 2013. Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of memory and language, 68(3), pp.255-278.
http://idiom.ucsd.edu/~rlevy/papers/barr-etal-2013-jml.pdf

Thank you @Rober Long for your detailed explanation. My dataset contains a large number of observations (more than 30000) from a with-subject experiment (31 participants and 500 items). Therefore, starting with a maximal random structure and then prune it using the parsimonious mixed model approach seems feasible based on what you said above ("It clearly isn't when the number of estimated variance components becomes close to the number of observations, but it might be a good idea when this is not the case. "). — Chloe, Aug 11 '20 at 08:13
It's not the total sample size, it's the number of subjects, and the extent of the intra class correlation. How many fixed effects do you have, and if these are factors, how many levels do they have ? And are you also fitting interactions ? Can you post the code of your maximal model ? — Robert Long, Aug 11 '20 at 08:20
I have two fixed effects: cord_vd (categorical, two levels) and cond_aud (categorical, three levels), and two random effects (participants and items). Here is summary of the model that converged (random slopes for items were removed because of convergence failure): lmer2=lmer(log(dv) ~ cond_vd*cond_aud+(cond_vd+cond_aud|pp)+(1|item),data=df) — Chloe, Aug 11 '20 at 08:40

Parsimonious Mixed Models

1 Answers1

Linked