I have a question about nested mixed effect model. For example I have species A with different populations; these populations belong to two kinds of habitat types (with or without predators). So I have population nested within habitat type, and I measured the body size of species A. I want to know if habitat type have a significant effect on body size or not. Thus, I got one model like this:
m1<-lmer(body_size~habitat_type+(1|year)+(1|habitat_type/population))
m2<-lmer(body_size~habitat_type+(1|year)+(1|habitat_type)+(1|habitat_type:population))
From http://conjugateprior.org/2013/01/formulae-in-r-anova/, I think m1 equals m2. But I was wodering why we regard habitat_type as both fixed effect and random effect? Does this make sense?
Or I should just use the following model (remove habitat_type as random effect):
m3<-lmer(body_size~habitat_type+(1|year)+(1|habitat_type:population))
outputs for models:
> summary(m2)
Linear mixed model fit by REML ['lmerMod']
Formula: body_size~ habitat_type + (1 | year) + (1 | habitat_type/population)
Data: exuv
REML criterion at convergence: 4219.3
Scaled residuals:
Min 1Q Median 3Q Max
-3.5214 -0.6482 0.0054 0.6369 3.2242
Random effects:
Groups Name Variance Std.Dev.
population:habitat_type (Intercept) 0.20850 0.4566
year (Intercept) 0.06942 0.2635
habitat_type (Intercept) 0.08963 0.2994
Residual 0.73031 0.8546
Number of obs: 1625, groups:
population:habitat_type, 50; year, 19; habitat_type, 2
Fixed effects:
Estimate Std. Error t value
(Intercept) 0.3916 0.3275 1.196
habitat_typeinv -0.6325 0.4509 -1.403
Correlation of Fixed Effects:
(Intr)
habtt_typnv -0.699
> summary(m3)
Linear mixed model fit by REML ['lmerMod']
Formula: body_size~ habitat_type + (1 | year) + (1 | habitat_type:population)
Data: exuv
REML criterion at convergence: 4219.3
Scaled residuals:
Min 1Q Median 3Q Max
-3.5214 -0.6482 0.0054 0.6369 3.2242
Random effects:
Groups Name Variance Std.Dev.
habitat_type:population (Intercept) 0.20850 0.4566
year (Intercept) 0.06942 0.2635
Residual 0.73031 0.8546
Number of obs: 1625, groups: habitat_type:population, 50; year, 19
Fixed effects:
Estimate Std. Error t value
(Intercept) 0.3916 0.1329 2.947
habitat_typeinv -0.6325 0.1549 -4.082
Correlation of Fixed Effects:
(Intr)
habtt_typnv -0.661
> summary(m4)
Linear mixed model fit by REML ['lmerMod']
Formula: body_size~ habitat_type + (1 | year) + (1 | population)
Data: exuv
REML criterion at convergence: 4219.3
Scaled residuals:
Min 1Q Median 3Q Max
-3.5214 -0.6482 0.0054 0.6369 3.2242
Random effects:
Groups Name Variance Std.Dev.
population (Intercept) 0.20850 0.4566
year (Intercept) 0.06942 0.2635
Residual 0.73031 0.8546
Number of obs: 1625, groups: population, 50; year, 19
Fixed effects:
Estimate Std. Error t value
(Intercept) 0.3916 0.1329 2.947
habitat_typeinv -0.6325 0.1549 -4.082
Correlation of Fixed Effects:
(Intr)
habtt_typnv -0.661
> summary(m2ml)
Linear mixed model fit by maximum likelihood ['lmerMod']
Formula: body_size~ habitat_type + (1 | year) + (1 | habitat_type/population)
Data: exuv
AIC BIC logLik deviance df.resid
4226.6 4259.0 -2107.3 4214.6 1619
Scaled residuals:
Min 1Q Median 3Q Max
-3.5215 -0.6493 0.0047 0.6360 3.2245
Random effects:
Groups Name Variance Std.Dev.
population:habitat_type (Intercept) 1.993e-01 4.464e-01
year (Intercept) 6.363e-02 2.523e-01
habitat_type (Intercept) 4.365e-18 2.089e-09
Residual 7.305e-01 8.547e-01
Number of obs: 1625, groups:
population:habitat_type, 50; year, 19; habitat_type, 2
Fixed effects:
Estimate Std. Error t value
(Intercept) 0.3889 0.1297 3.000
habitat_typeinv -0.6299 0.1519 -4.147
Correlation of Fixed Effects:
(Intr)
habtt_typnv -0.664
(1|habitat_type), then it's equal to m3. That's what I would expect (because you are right: if there is fixed term forhabitat_typethen there is nothing left for the random term to capture). – amoeba Apr 12 '17 at 15:09populationis coded such that different habitats have different population ids, then you can simply write(1|population)instead of(1|habitat_type:population). – amoeba Apr 12 '17 at 15:12(1|population),m4<-lmer(body_size~habitat_type+(1|population))). I found they almost generated the same output, especially between m3 and m4. In m2 model, habitat_type as random effect still has 0.08963 (variance) and 0.2994 (std. dev.). Thus, in my situation, I think I can use either(1|population)or(1|habitat_type:population), because the results are exactly the same. – Bin Apr 18 '17 at 17:00(1|population)and(1|habitat_type:population)are mathematically equivalent if your population id codes are distinct in different habitats (as I guess they are). Bottomline: usem4. – amoeba Apr 18 '17 at 18:08m2, I don't know whyhabitat_typecomes out with non-zero variance. To my understanding, it should be exactly zero. That's strange. But you should usem4in any case. – amoeba Apr 18 '17 at 19:22REML = FALSEforlmer? Can you provide the complete output summary of all the models you tried? – Randel Apr 18 '17 at 21:45REML=F. Then for m2 the variance=4.3e-18, Std.Dev.=2.1e-09. Almost zero @amoeba. This is amazing. But this further makes me confused about how to choose between REML and ML. I checked some information about REML and ML differences. In my mind, usually REML is preferred. – Bin Apr 19 '17 at 13:16habitat_typewhereas using REML yields some non-zero value. – amoeba Apr 19 '17 at 14:09yearis also a random effect in my model. I also tried model without random effect(1|year). In this situation, thehabitat_typeas random effect variance equals to 0, just like you mentioned. But when there is(1|year), thenhabitat_typeas random effect variance equals to 0.08963. Maybe the complexity of random effects is a reason. – Bin Apr 19 '17 at 14:11