8

Are following 2 models really the same?

> library(lme4)
> library(lmerTest)
> lmod = lmer(Reaction ~ Days + (Days|Subject), data=sleepstudy)    
> summary(lmod)
Linear mixed model fit by REML 
t-tests use  Satterthwaite approximations to degrees of freedom ['merModLmerTest']
Formula: Reaction ~ Days + (Days | Subject)
   Data: sleepstudy

REML criterion at convergence: 1743.6

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.9536 -0.4634  0.0231  0.4634  5.1793 

Random effects:
 Groups   Name        Variance Std.Dev. Corr
 Subject  (Intercept) 612.09   24.740       
          Days         35.07    5.922   0.07
 Residual             654.94   25.592       
Number of obs: 180, groups:  Subject, 18

Fixed effects:
            Estimate Std. Error      df t value             Pr(>|t|)    
(Intercept)  251.405      6.825  17.000  36.838 < 0.0000000000000002 ***
Days          10.467      1.546  17.000   6.771           0.00000326 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects:
     (Intr)
Days -0.138

And:

> laov = aov(Reaction ~ Days + Error(Subject/Days), data=sleepstudy)   
> summary(laov)

Error: Subject
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals 17 250618   14742               

Error: Subject:Days
          Df Sum Sq Mean Sq F value     Pr(>F)      
Days       1 162703  162703   45.85 0.00000326 ***  
Residuals 17  60322    3548                       
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Error: Within
           Df Sum Sq Mean Sq F value Pr(>F)
Residuals 144  94312   654.9               

Both are showing similar P values for Days variable. What is the difference between two methods?

amoeba
  • 104,745
rnso
  • 10,009

1 Answers1

5

That the two models give identical $p$ values, and that $\sqrt{F}=\sqrt{45.85}=6.77=t$, is strong a priori evidence that models are indeed the same. Both end up fitting the model

$$ \begin{split} \mu_{it} & = \beta_0 + b_{0,i} + (\beta_1 + b_{1,i}) t \\ Y_i & \sim \textrm{Normal}(\mu_{it}, \sigma^2) \\ \boldsymbol{b} & \sim \textrm{MVN}(\boldsymbol{0},\Sigma) \end{split} $$

In general, aov() is simpler and faster, but only works for a simpler subset of models.

Note that, although nlme::lme can also fit this model, and gets the same $F$ statistic, its heuristic for guessing the correct denominator degrees of freedom is incorrect in this case (it gets 161 rather than 17):

library(nlme)
anova(lme(Reaction ~ Days,
                random = ~Days|Subject, sleepstudy))
##             numDF denDF   F-value p-value
## (Intercept)     1   161 1454.0766  <.0001
## Days            1   161   45.8534  <.0001

@amoeba notes correctly that

a really important caveat is that these two models only coincide because Days is treated as numerical variable. If it were a categorical factor, then aov with Error(Subject/Days) and lmer with (Days|Subject) would be two different models. To get [the] lmer equivalent of aov [in this case] one would probably need to use (1|Subject)+(1|Subject:Days) [or equivalently (1|Subject/Days)]

Ben Bolker
  • 43,543
  • For aov, how is the term "Error(subject)" different from "Error(subject/days)" here. I think corresponding term for lmer would be (1|subject). – rnso Jun 03 '15 at 17:04
  • 4
    +1 but I think a really important caveat is that these two models only coincide because Days is treated as numerical variable. If it were a categorical factor, then aov with Error(Subject/Days) and lmer with (Days|Subject) would be two different models. To get lmer equivalent of aov one would probably need to use (1|Subject)+(1|Subject:Days). – amoeba Jan 05 '18 at 14:37