3

I am trying to test the interaction term of level1 * question. I have two ways, using anova(lm) and anova(lm, lm2). Why their results are so different?

Code

pacman::p_load(nlme)
data_long = read.csv("https://raw.githubusercontent.com/slfan2013/Jen-Falbe---menu-labeling---Dec-2020/main/STATA/data_long.csv?token=ADGNCRVI2JVHC72URESWLFS765ALK")[,-1]
lm = lme(pme ~ level1 * question, random = ~1 | subject, data = data_long)
lm2 = lme(pme ~ level1 + question, random = ~1 | subject, data = data_long)

Results

> anova(lm)
                numDF denDF   F-value p-value
(Intercept)         1  2648 21598.329  <.0001
level1              2  1324     8.981  0.0001
question            2  2648    17.354  <.0001
level1:question     4  2648     0.857  0.4888

> anova(lm, lm2) Model df AIC BIC logLik Test L.Ratio p-value lm 1 11 11853.99 11923.15 -5915.995
lm2 2 7 11838.61 11882.62 -5912.303 1 vs 2 7.383951 0.1169

I don't understand why the interaction p-value from the first method is 0.4888 and the second is 0.1169, which is too different.

WCMC
  • 1,058
  • Why would you assume they to be the same? They measure different things. – Tim Dec 31 '20 at 18:08
  • I thought anova(lm, lm2) is likelihood ratio test for interaction term and anova(lm) is Wald test for interaction term? – WCMC Dec 31 '20 at 18:10
  • anova(lm, lm2) is a likelihood ratio test that compares two models, the second gives you tests for the individual effects. – Tim Dec 31 '20 at 18:43
  • But their interpretation should all focus on the interaction term, and they should not be that different, especially given the not-small sample size in the data. – WCMC Dec 31 '20 at 18:45
  • You could do a simulation study to see what happens when (1) level1 and question are independent, and (2) they are (highly) correlated. From that, I guess you have some insights into your question. – TrungDung Dec 31 '20 at 19:14
  • That is too complicated. I expect there should be simple interpretations of such difference, i.e. coding error or missing any argument in some r functions. By the way, level1 and question ARE independent. – WCMC Dec 31 '20 at 19:16
  • @KarolisKoncevičius No. Based on that question, I should get exactly same result. – WCMC Dec 31 '20 at 21:39

1 Answers1

3

The default optimization criterion of lme is restricted maximum likelihood (REML). REML is not compatible with likelihood ratio tests. You need to refit your models with maximum likelihood, method = "ML", to compare them on the basis of a likelihood ratio test. Doing so, we see more agreement with the F test.

lm <- lme(pme ~ level1 * question, random = ~1 | subject, data = data_long, method = "ML")
lm2 <- lme(pme ~ level1 + question, random = ~1 | subject, data = data_long, method = "ML")

anova(lm, lm2)

The output is

    Model df      AIC      BIC    logLik   Test  L.Ratio p-value
lm      1 11 11822.01 11891.20 -5900.007                        
lm2     2  7 11817.45 11861.47 -5901.725 1 vs 2 3.435362  0.4878

If you do not use maximum likelihood to fit your mixed effects models, anova(lm, lm2) will complain with a warning that the comparisons are not meaningful!

JTH
  • 1,033