minimum amount of prior to get a mixed-model to converge (in R)

Question

This may be a simple/naive question, but I have a non-converging lmer() model due to singularity of its random covariance matrix.

I was wondering what is a possible minimum prior specification in blmer() to get this singular model to converge?

library(lme4)
library(blme)
hsb <- read.csv('https://raw.githubusercontent.com/rnorouzian/e/master/hsb.csv')
m1 <- lmer(math ~ ses*sector + (ses | sch.id), data = hsb)
m2 <- blmer(math ~ ses*sector + (ses | sch.id), data = hsb, cov.prior = ???) ## A possible covariance prior

Is this the same data as before? If so then I thought we had established that random slopes over schools for a variable that is constant within schools doesnt make sense. — Robert Long, Oct 03 '20 at 17:40
@RobertLong, no Rob, this is a completely different (and real) dataset. Here ses is a level-1 predictor. sector is a level-2 predictor. — rnorouzian, Oct 03 '20 at 18:59
OK there is something a bit strange with your data, which may have something to do with the problem. if you convert sch.id to factor and fit an lm model with sch.id as a fixed effect, the model matrix is singular. I would fix this problem first. — Robert Long, Oct 04 '20 at 07:29
@RobertLong, good catch! But I see no change after turning sch.id into a factor. — rnorouzian, Oct 04 '20 at 07:35
You have to fit lm(math ~ ses*sector + sch.id, data = hsb) %>% summary() with schi.id as a factor to see the problem. — Robert Long, Oct 04 '20 at 07:39
Maybe yes, or maybe some combinations of the factors don't exist. I don't know what the exact problem is, but you need to fix that singular model matrix for fixed effects, becuase that is basically what is used when you fit random slopes for the fixed effects, so that might be the cause of the singular VCV of random effects. So once you can fit the lm() model with sch.id as a fixed effect you can move on to the mixed model. — Robert Long, Oct 04 '20 at 08:05

score 0 · Answer 1 · answered Oct 04 '20 at 22:54

I think the problem here is that, even once you convert sch.id into a factor variable:

library(dplyr)
hsb <- read.csv('https://raw.githubusercontent.com/rnorouzian/e/master/hsb.csv')
hsb <- hsb %>% mutate(sch.id=as.factor(sch.id))

and check that the lm model that Robert suggested actually runs (it does!), a slight simplification of the model (remove the interaction between ses and sector) leads to full convergence.

m0 <- lmer(math ~ ses + sector + (ses | sch.id), data = hsb)
summary(m0)
REML criterion at convergence: 46601.9
Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.1373 -0.7296  0.0225  0.7568  2.8920
Random effects:
 Groups   Name        Variance Std.Dev. Corr
 sch.id   (Intercept)  3.9646  1.991

          ses          0.4343  0.659    0.55
 Residual             36.8008  6.066

Number of obs: 7185, groups:  sch.id, 160
Fixed effects:
            Estimate Std. Error t value
(Intercept)  11.4729     0.2315  49.568
ses           2.3854     0.1179  20.238
sector        2.5408     0.3445   7.375
Correlation of Fixed Effects:
       (Intr) ses

ses     0.228

sector -0.655 -0.079

The random effects variance term for ses is not particularly large and the ses variable itself is uncentered (its original metric). Once you add the sector*ses interaction, the random slope variance decreases to 0.08...so according to this model the interaction reduces the slope variance to nearly 0.

In multilevel models you are often better to consider centering level 1 variables about their group mean when estimating a random slope:

hsb <- hsb %>% group_by(sch.id) %>% mutate(smn_ses = mean(ses)) %>% ungroup() %>% mutate(cwc_ses=ses-smn_ses)

Then re-running your models. Below I show results from a cwc_ses model without the interaction and then from a model with the interaction. The interaction model no longer has convergence issues:

=============================================================
                                 Model 1        Model 2      
-------------------------------------------------------------
(Intercept)                          11.11 ***      11.39 ***
                                     (0.29)         (0.29)   
cwc_ses                               2.21 ***       2.80 ***
                                     (0.13)         (0.15)   
sector                                3.46 ***       2.81 ***
                                     (0.42)         (0.44)   
cwc_ses:sector                                      -1.34 ***
                                                    (0.23)   
-------------------------------------------------------------
AIC                               46681.31       46654.60    
BIC                               46729.47       46709.64    
Log Likelihood                   -23333.66      -23319.30    
Num. obs.                          7185           7185       
Num. groups: sch.id                 160            160       
Var: sch.id (Intercept)               6.84           6.74    
Var: sch.id cwc_ses                   0.70           0.27    
Cov: sch.id (Intercept) cwc_ses       1.27           1.05    
Var: Residual                        36.71          36.71    
=============================================================
*** p < 0.001; ** p < 0.01; * p < 0.05

The nice thing about centering in this way is that the cwc_ses coefficient is unambiguously a within-school comparison between students who differ on ses by 1 unit. And the random slope indicates how much between school variation there is in this within-school relationship. When you leave ses uncentered, you have a funky variable that contains information both about the within- and between-school association between ses and math achievement. A very useful reading on this topic is the paper by Enders and Tofighi from 2007.

minimum amount of prior to get a mixed-model to converge (in R)

1 Answers1