I'll TL:DR this in case if it's too much to sort through.
Depending on method: either lmer(), lme(), or aov()
and output method: either summary(), car::Anova(), stats::anova() or lmerTest::anova(), I get different results for my variables. Even if the model is the same. My main confusion is: Why all the different methods, and which should I trust? I have reproduced the methods and outputs for each below, hence why there's so much space used up here.
To start, here's some example data that mimics the set I'm working with. Just with fewer observations. We'll refer to it as just "data"
| Y | A | B | C |
|:----|:----:|:--:|--:|
|67.1 | 2.33 | g | t |
|77.9 | 2.31 | g | t |
|69.1 | 2.31 | g | t |
|78.4 | 2.31 | g | t |
|64.9 | 2.35 | g | t |
|71.4 | 2.35 | g | t |
|78.3 | 2.31 | g | t |
|71.1 | 2.35 | g | t |
|67.9 | 2.35 | g | t |
|59.2 | 2.35 | g | t |
|65.2 | 2.35 | g | t |
|84.5 | 2.83 | s | t |
|96.6 | 2.758| s | t |
|74 | 2.517| s | t |
|68.7 | 2.517| s | t |
|86.1 | 2.517| s | t |
|80.6 | 2.517| s | t |
|95.1 | 2.982| s | t |
|68.2 | 3.027| s | t |
|73.7 | 2.982| s | t |
|58.1 | 2.982| s | t |
|83.1 | 2.855| s | t |
|73.5 | 3.027| s | t |
|60 | 2.76 | s | t |
|80.1 | 3.267| s | t |
|88.3 | 2.862| s | t |
|80.7 | 2.687| s | t |
|86.3 | 3.775| s | t |
|82.4 | 3.027| s | t |
|83.4 | 3.027| s | t |
|86.3 | 3.775| s | t |
|75.5 | 3.027| s | t |
|74 | 3.027| s | t |
|74.8 | 3.267| s | t |
|74.4 | 2.897| g | m |
|78.7 | 2.897| g | m |
|73.6 | 2.897| g | m |
|86.9 | 3.215| g | m |
|76.1 | 2.897| g | m |
|77 | 2.897| g | m |
|75.7 | 2.897| g | m |
|56.2 | 2.897| g | m |
|65.9 | 2.897| g | m |
|75.8 | 3.215| g | m |
|85.1 | 3.215| g | m |
|68.9 | 3.215| g | m |
|86.9 | 3.215| g | m |
|78.4 | 3.215| g | m |
|77.2 | 3.227| s | m |
|69.8 | 3.227| s | m |
|76.5 | 3.127| s | m |
|90 | 3.185| s | m |
|83.5 | 3.155| s | m |
|70.2 | 3.185| s | m |
|81.1 | 3.185| s | m |
|73.7 | 3.155| s | m |
|84.1 | 3.185| s | m |
|74.1 | 3.155| s | m |
|58.5 | 2.5 | s | m |
|41.1 | 2.5 | s | m |
|63.9 | 2.5 | s | m |
|61.1 | 2.5 | s | m |
|81.1 | 2.875| s | m |
|73.8 | 2.942| s | m |
|84.8 | 2.867| s | m |
|76.5 | 2.942| s | m |
|52.5 | 2.942| s | m |
As the title states, I have a dataset with one continuous DV (Y), one continuous IV (A), one categorical IV (B), and a blocking factor (C). I want to test if the relationship between Y and A is different for each category of B, taking into account the blocking factor C. I'm not sure what that exact test is, but I know I can disaggregate my data by group B and run a separate model for each group; so comparing Y~A of Bg and Y~A of Bs. My guess is I can do an ANOVA/ANCOVA so that I do not need to disaggregate my data and lose some of the sample size. (Also I'm highly confused how interaction effect should be used, and which variable should enter the model first?)
I've found a few methods for running ANOVAS with blocks. I've found that treating the blocking factor as random decreases both AIC and BIC, so ideally the method takes into account a random factor:
https://rcompanion.org/handbook/I_09.html a comprehensive walkthrough of an ANOVA using lmer
https://www.datanovia.com/en/lessons/ancova-in-r/ a good guide using lm, but no blocking term help
https://www.statology.org/ancova-in-r/ simple, but no blocking term help
https://www.r-bloggers.com/2013/01/formulae-in-r-anova-and-other-models-mixed-and-fixed/ blocking term accounting using the aov function
So I have a few methods here. I'll write the code and show outputs
#lmer method
library(lme4)
library(lmerTest)
library(car)
#model with no interaction
>model1 = lmer(Y ~ B + A +(1|C), REML = T, data = data)
#alternate model with interaction effect
>model2 = lmer(Y ~ B*A +(1|C), REML = T, data = data)
>summary(model1)
Linear mixed model fit by REML. t-tests use Satterthwaite''s method ['lmerModLmerTest']
Formula: Y ~ B + A + (1 | C)
Data: data
REML criterion at convergence: 475.6
Scaled residuals:
Min 1Q Median 3Q Max
-2.77136 -0.51061 0.00536 0.64664 1.99431
Random effects:
Groups Name Variance Std.Dev.
C (Intercept) 39.00 6.245
Residual 84.08 9.170
Number of obs: 67, groups: C, 2
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 5.336 18.305 43.324 0.291 0.772068
Bs -10.466 4.092 62.851 -2.557 0.012970 *
A 25.839 6.716 61.567 3.848 0.000286 ***
Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) Bs
Bs 0.748
A -0.965 -0.822
>summary(model2)
Linear mixed model fit by REML. t-tests use Satterthwaite''s method ['lmerModLmerTest']
Formula: Y ~ B * A + (1 | C)
Data: data
REML criterion at convergence: 468.9
Scaled residuals:
Min 1Q Median 3Q Max
-2.74515 -0.51858 0.01236 0.63517 1.98255
Random effects:
Groups Name Variance Std.Dev.
C (Intercept) 38.74 6.224
Residual 85.44 9.243
Number of obs: 67, groups: C, 2
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 6.3453 23.0048 49.3777 0.276 0.78383
Bs -12.6877 33.1317 62.3193 -0.383 0.70306
A 25.4562 8.5438 60.1365 2.979 0.00416 **
Bs:A 0.7725 11.3505 62.5953 0.068 0.94596
Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) Bs A
Bs -0.519
A -0.978 0.525
Bs:A 0.598 -0.992 -0.611
From the output of model 1, I can see that there is a significance in both A and B, while model 2 only a significance in A
Here's where confusion #1 starts:
I can run either anova() or Anova() for both models. lmerTest::anova() gives me a type III anova, while car::Anova() gives a type II Analysis of Deviance table.
>anova(model1)
Type III Analysis of Variance Table with Satterthwaite''s method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
B 549.96 549.96 1 62.851 6.5406 0.0129696 *
A 1244.85 1244.85 1 61.567 14.8047 0.0002858 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>Anova(model1)
Analysis of Deviance Table (Type II Wald chisquare tests)
Response: Y
Chisq Df Pr(>Chisq)
B 6.5406 1 0.0105442 *
A 14.8047 1 0.0001192 ***
Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>anova(model2)
Type III Analysis of Variance Table with Satterthwaite''s method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
B 12.53 12.53 1 62.319 0.1466 0.7030619
A 1240.50 1240.50 1 60.892 14.5188 0.0003256 ***
B:A 0.40 0.40 1 62.595 0.0046 0.9459552
Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>Anova(model2)
Analysis of Deviance Table (Type II Wald chisquare tests)
Response: Y
Chisq Df Pr(>Chisq)
B 6.4223 1 0.0112698 *
A 14.5504 1 0.0001365 ***
B:A 0.0046 1 0.9457383
Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Why the difference in significance for B for model 2 (interaction model) between summary(), car::Anova(), and lmerTest::anova(). Which do I trust? To further complicate things, let's look at the stats::aov() method.
model1 = aov(Y ~ B + A + Error(C), data = data)
model2 = aov(Y ~ B*A + Error(C), data = data)
>summary(model1)
Error: C
Df Sum Sq Mean Sq
B 1 87.39 87.39
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
B 1 95 94.6 1.125 0.292987
A 1 1325 1324.7 15.755 0.000187 ***
Residuals 63 5297 84.1
Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>summary(model2)
Error: C
Df Sum Sq Mean Sq
B 1 87.39 87.39
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
B 1 95 94.6 1.107 0.296881
A 1 1325 1324.7 15.505 0.000211 ***
B:A 1 0 0.0 0.000 0.993461
Residuals 62 5297 85.4
Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
So here the models agree on the significance of B with exact p-values. Now the issue here is that both stats::anova() and car::Anova() will not handle these models, as is the method in the Statology link. Adding the block as an error term makes this unviable. The Error message for both is:
Error in UseMethod("vcov") :
no applicable method for 'vcov' applied to an object of class "c('aovlist', 'listof')"
Now our last method, lme
library(nlme)
>model1 = lme(Y ~ B + A, random = ~1|C, method = "REML", data = data)
>model2 = lme(Y ~ B*A, random = ~1|C, method = "REML", data = data)
>summary(model1)
Linear mixed-effects model fit by REML
Data: data
AIC BIC logLik
485.6355 496.4299 -237.8178
Random effects:
Formula: ~1 | C
(Intercept) Residual
StdDev: 6.245355 9.169755
Fixed effects: Y ~ B + A
Value Std.Error DF t-value p-value
(Intercept) 5.33563 18.304636 63 0.291491 0.7716
Bs -10.46557 4.092189 63 -2.557451 0.0130
A 25.83920 6.715509 63 3.847691 0.0003
Correlation:
(Intr) Bs
Bs 0.748
A -0.965 -0.822
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-2.771357518 -0.510609057 0.005356615 0.646641593 1.994309873
Number of Observations: 67
Number of Groups: 2
>summary(model2)
Linear mixed-effects model fit by REML
Data: data
AIC BIC logLik
480.9425 493.8013 -234.4713
Random effects:
Formula: ~1 | C
(Intercept) Residual
StdDev: 6.224159 9.243403
Fixed effects: Y ~ B * A
Value Std.Error DF t-value p-value
(Intercept) 6.345383 23.00476 62 0.2758291 0.7836
Bs -12.687705 33.13166 62 -0.3829481 0.7031
A 25.456188 8.54384 62 2.9794790 0.0041
Bs:A 0.772521 11.35050 62 0.0680606 0.9460
Correlation:
(Intr) Bs A
Bs -0.519
A -0.978 0.525
Bs:A 0.598 -0.992 -0.611
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-2.74514990 -0.51857689 0.01236403 0.63517280 1.98254600
Number of Observations: 67
Number of Groups: 2
> anova(model1)
numDF denDF F-value p-value
(Intercept) 1 63 270.07347 <.0001
B 1 63 1.13754 0.2902
A 1 63 14.80473 0.0003
> anova(model2)
numDF denDF F-value p-value
(Intercept) 1 62 271.53755 <.0001
B 1 62 1.11976 0.2941
A 1 62 14.55037 0.0003
B:A 1 62 0.00463 0.9460
> Anova(model1)
Analysis of Deviance Table (Type II tests)
Response: Y
Chisq Df Pr(>Chisq)
B 6.5406 1 0.0105442 *
A 14.8047 1 0.0001192 ***
Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> Anova(model2)
Analysis of Deviance Table (Type II tests)
Response: Y
Chisq Df Pr(>Chisq)
B 6.4223 1 0.0112698 *
A 14.5504 1 0.0001365 ***
B:A 0.0046 1 0.9457374
Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Here, the different anova()/Anova() tables disagree with each other and the summary() outputs. an anova.lme() for both models agrees with the anova() output.
Also, rather than an analysis of variance table in anova(), there's an analysis of deviance table. I can't find a concise answer on the difference between these types of analysis.
This is just the tip of the iceberg on these methods, and it's also limited by my understanding of both R and statistics. From what I understand, a lot of these methods differ by either a Type I, Type II, or Type III ANOVA, as well as how lmer vs lme operate. Completely different results from lmer() and lme() lme() and lmer() giving conflicting results
Any help as to which method to pick would be helpful.