Can a factor be regarded as both random and fixed effect?

Question

I have a question about nested mixed effect model. For example I have species A with different populations; these populations belong to two kinds of habitat types (with or without predators). So I have population nested within habitat type, and I measured the body size of species A. I want to know if habitat type have a significant effect on body size or not. Thus, I got one model like this:

m1<-lmer(body_size~habitat_type+(1|year)+(1|habitat_type/population))
m2<-lmer(body_size~habitat_type+(1|year)+(1|habitat_type)+(1|habitat_type:population))

From http://conjugateprior.org/2013/01/formulae-in-r-anova/, I think m1 equals m2. But I was wodering why we regard habitat_type as both fixed effect and random effect? Does this make sense? Or I should just use the following model (remove habitat_type as random effect):

m3<-lmer(body_size~habitat_type+(1|year)+(1|habitat_type:population))

outputs for models:

   > summary(m2)
Linear mixed model fit by REML ['lmerMod']
Formula: body_size~ habitat_type + (1 | year) + (1 | habitat_type/population)
   Data: exuv

REML criterion at convergence: 4219.3

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.5214 -0.6482  0.0054  0.6369  3.2242 

Random effects:
 Groups                  Name        Variance Std.Dev.
 population:habitat_type (Intercept) 0.20850  0.4566  
 year                    (Intercept) 0.06942  0.2635  
 habitat_type            (Intercept) 0.08963  0.2994  
 Residual                            0.73031  0.8546  
Number of obs: 1625, groups:  
population:habitat_type, 50; year, 19; habitat_type, 2

Fixed effects:
                Estimate Std. Error t value
(Intercept)       0.3916     0.3275   1.196
habitat_typeinv  -0.6325     0.4509  -1.403

Correlation of Fixed Effects:
            (Intr)
habtt_typnv -0.699  

    > summary(m3)
Linear mixed model fit by REML ['lmerMod']
Formula: body_size~ habitat_type + (1 | year) + (1 | habitat_type:population)
   Data: exuv

REML criterion at convergence: 4219.3

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.5214 -0.6482  0.0054  0.6369  3.2242 

Random effects:
 Groups                  Name        Variance Std.Dev.
 habitat_type:population (Intercept) 0.20850  0.4566  
 year                    (Intercept) 0.06942  0.2635  
 Residual                            0.73031  0.8546  
Number of obs: 1625, groups:  habitat_type:population, 50; year, 19

Fixed effects:
                Estimate Std. Error t value
(Intercept)       0.3916     0.1329   2.947
habitat_typeinv  -0.6325     0.1549  -4.082

Correlation of Fixed Effects:
            (Intr)
habtt_typnv -0.661


    > summary(m4)
Linear mixed model fit by REML ['lmerMod']
Formula: body_size~ habitat_type + (1 | year) + (1 | population)
   Data: exuv

REML criterion at convergence: 4219.3

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.5214 -0.6482  0.0054  0.6369  3.2242 

Random effects:
 Groups     Name        Variance Std.Dev.
 population (Intercept) 0.20850  0.4566  
 year       (Intercept) 0.06942  0.2635  
 Residual               0.73031  0.8546  
Number of obs: 1625, groups:  population, 50; year, 19

Fixed effects:
                Estimate Std. Error t value
(Intercept)       0.3916     0.1329   2.947
habitat_typeinv  -0.6325     0.1549  -4.082

Correlation of Fixed Effects:
            (Intr)
habtt_typnv -0.661

    > summary(m2ml)
Linear mixed model fit by maximum likelihood  ['lmerMod']
Formula: body_size~ habitat_type + (1 | year) + (1 | habitat_type/population)
   Data: exuv

     AIC      BIC   logLik deviance df.resid 
  4226.6   4259.0  -2107.3   4214.6     1619 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.5215 -0.6493  0.0047  0.6360  3.2245 

Random effects:
 Groups                  Name        Variance  Std.Dev. 
 population:habitat_type (Intercept) 1.993e-01 4.464e-01
 year                    (Intercept) 6.363e-02 2.523e-01
 habitat_type            (Intercept) 4.365e-18 2.089e-09
 Residual                            7.305e-01 8.547e-01
Number of obs: 1625, groups:  
population:habitat_type, 50; year, 19; habitat_type, 2

Fixed effects:
                Estimate Std. Error t value
(Intercept)       0.3889     0.1297   3.000
habitat_typeinv  -0.6299     0.1519  -4.147

Correlation of Fixed Effects:
            (Intr)
habtt_typnv -0.664

I found some answers with similar questions. It is very helpful. See http://stats.stackexchange.com/questions/70556/have-i-correctly-specified-my-model-in-lmer. — Bin, Apr 12 '17 at 12:53
So what is the output when you fit these models? If m2 estimates zero variance for (1|habitat_type), then it's equal to m3. That's what I would expect (because you are right: if there is fixed term for habitat_type then there is nothing left for the random term to capture). — amoeba, Apr 12 '17 at 15:09
Also, if your population is coded such that different habitats have different population ids, then you can simply write (1|population) instead of (1|habitat_type:population). — amoeba, Apr 12 '17 at 15:12
You @amoeba got a good point. I tried with m2, m3 and another new model m4 (with random factor (1|population), m4<-lmer(body_size~habitat_type+(1|population))). I found they almost generated the same output, especially between m3 and m4. In m2 model, habitat_type as random effect still has 0.08963 (variance) and 0.2994 (std. dev.). Thus, in my situation, I think I can use either (1|population) or (1|habitat_type:population), because the results are exactly the same. — Bin, Apr 18 '17 at 17:00
You should definitely not use habitat_type as a random effect if you include it as a fixed one. And (1|population) and (1|habitat_type:population) are mathematically equivalent if your population id codes are distinct in different habitats (as I guess they are). Bottomline: use m4. — amoeba, Apr 18 '17 at 18:08
Regarding m2, I don't know why habitat_type comes out with non-zero variance. To my understanding, it should be exactly zero. That's strange. But you should use m4 in any case. — amoeba, Apr 18 '17 at 19:22
How about using maximum likelihood, i.e., setting REML = FALSE for lmer? Can you provide the complete output summary of all the models you tried? — Randel, Apr 18 '17 at 21:45
Thank you very much @JiebiaoWang. I followed your advice and use the setting REML=F. Then for m2 the variance=4.3e-18, Std.Dev.=2.1e-09. Almost zero @amoeba. This is amazing. But this further makes me confused about how to choose between REML and ML. I checked some information about REML and ML differences. In my mind, usually REML is preferred. — Bin, Apr 19 '17 at 13:16
@JiebiaoWang Cool. I would also like to know why using ML yields zero (up to machine precision) variance of habitat_type whereas using REML yields some non-zero value. — amoeba, Apr 19 '17 at 14:09
@amoeba Sorry for forgetting to mention year is also a random effect in my model. I also tried model without random effect (1|year). In this situation, the habitat_type as random effect variance equals to 0, just like you mentioned. But when there is (1|year), then habitat_type as random effect variance equals to 0.08963. Maybe the complexity of random effects is a reason. — Bin, Apr 19 '17 at 14:11
As @amoeba pointed out, here is a simplified but more extreme example. In my understanding, REML and ML should not differ so much in this case. There may be some numerical issues for REML. Given that the estimates rely on iterations, I will think more about this when I have more time. Any comments are welcome. — Randel, Apr 19 '17 at 19:36

Can a factor be regarded as both random and fixed effect?

0 Answers0

Linked