2

The aim is to determine at which level to group data (i. e. at individual, plot or regional level) by comparing the variance components in a random effects model.

mod <- lmer(val ~ (1|individual) + (1|plot) + (1|region))
confint(mod)
                2.5 %    97.5 %
.sig01      0.2514234 0.2826764
.sig02      0.2443112 0.3292156
.sig03      0.1548823 0.3659306
.sigma      0.8462618 0.8548915
(Intercept) 1.1728222 1.4402827

How should the variance components be compared so as to decide which level(s) of aggregation to retain in subsequent models?

  • is this helpful? https://stats.stackexchange.com/questions/56150/how-can-i-test-whether-a-random-effect-is-significant – rep_ho Oct 04 '22 at 16:36
  • @rep_ho Unfortunately not. I'd like to compare different random effects that all have a variance > 0. –  Oct 05 '22 at 06:08
  • But if they have ale variance >0 then you should keep them all no? – rep_ho Oct 05 '22 at 10:10
  • One way might be to compare models with and without each random effect via model fit. – Sointu Oct 05 '22 at 12:37
  • @Sointu that is the same as testing if the effect is signification which is discussed in the link I shared – rep_ho Oct 05 '22 at 18:21
  • Ah, yeah sorry. I don't know a formal way of testing whether one random effect is larger than another. Maybe just using the confidence intervals would be the way - in that case your three random effects would not significantly differ from each other as the CIs are overlapping. But if the (ultimate) goal is to figure out which random effects to include into a model, I do think you should include all three as they all clearly have cluster-specific variance re: your outcome. – Sointu Oct 06 '22 at 07:48

1 Answers1

0

Assuming that this part of the model development process, then I would recommend choosing the grouping variable based on the one with the lowest AIC (similar to what you would do in a stepwise AIC process).

require(lme4)

dataf <- data.frame( individual = factor(rep(1:100, times = 2)), plot = factor(rep(1:10, 20)), region = factor(rep(1:4, 50)) )

make the true dependence on the plot grouping

set.seed(1933) dataf$val <- 7 + as.numeric(dataf$plot)/10 + rnorm(200, 0, 1)

mod_i <- lme4::lmer(val ~ (1|individual), data = dataf) mod_p <- lme4::lmer(val ~ (1|plot), data = dataf) mod_r <- lme4::lmer(val ~ (1|region), data = dataf) boundary (singular) fit: see help('isSingular')

> extractAIC(mod_i) [1] 3.000 617.753 > extractAIC(mod_p) [1] 3.0000 608.7551 > extractAIC(mod_r) [1] 3.0000 621.5314

mod_pi <- lme4::lmer(val ~ (1|plot) + (1|individual), data = dataf) mod_pr <- lme4::lmer(val ~ (1|plot) + (1|region), data = dataf)

> extractAIC(mod_pi) [1] 4.0000 610.0804 > extractAIC(mod_pr) [1] 4.0000 610.7551

Final model is mod_p with the lowest AIC

make the true dependence on the individual grouping

set.seed(1933) dataf$val <- 7 + as.numeric(dataf$individual)/100 + rnorm(200, 0, 1)

mod_i <- lme4::lmer(val ~ (1|individual), data = dataf) mod_p <- lme4::lmer(val ~ (1|plot), data = dataf) mod_r <- lme4::lmer(val ~ (1|region), data = dataf)

> extractAIC(mod_i) [1] 3.000 603.226 > extractAIC(mod_p) [1] 3.0000 604.1367 > extractAIC(mod_r) [1] 3.0000 604.7195

mod_ip <- lme4::lmer(val ~ (1|individual) + (1|plot), data = dataf) mod_ir <- lme4::lmer(val ~ (1|individual) + (1|region), data = dataf)

> extractAIC(mod_ip) [1] 4.0000 604.9881 > extractAIC(mod_ir) [1] 4.000 605.226

final model is mod_i with the lowest AIC

If the ultimate question is testing whether the grouping variable is statistically significant in the final model, then I also recommend the posting in the comment that @rep_ho made on the original question. Specifically, take a look at the RLRsim package.

R Carnell
  • 5,323