R: What's the correct way to run an ANOVA with a continuous DV, one continuous IV, one categorical IV, and a blocking factor?

Question

I'll TL:DR this in case if it's too much to sort through.

Depending on method: either lmer(), lme(), or aov() and output method: either summary(), car::Anova(), stats::anova() or lmerTest::anova(), I get different results for my variables. Even if the model is the same. My main confusion is: Why all the different methods, and which should I trust? I have reproduced the methods and outputs for each below, hence why there's so much space used up here.

To start, here's some example data that mimics the set I'm working with. Just with fewer observations. We'll refer to it as just "data"

|  Y  |  A   | B  | C |
|:----|:----:|:--:|--:|
|67.1 | 2.33 |  g | t |
|77.9 | 2.31 |  g | t |
|69.1 | 2.31 |  g | t |
|78.4 | 2.31 |  g | t |
|64.9 | 2.35 |  g | t |
|71.4 | 2.35 |  g | t |
|78.3 | 2.31 |  g | t |
|71.1 | 2.35 |  g | t |
|67.9 | 2.35 |  g | t |
|59.2 | 2.35 |  g | t |
|65.2 | 2.35 |  g | t |
|84.5 | 2.83 |  s | t |
|96.6 | 2.758|  s | t |
|74   | 2.517|  s | t |
|68.7 | 2.517|  s | t |
|86.1 | 2.517|  s | t |
|80.6 | 2.517|  s | t |
|95.1 | 2.982|  s | t |
|68.2 | 3.027|  s | t |
|73.7 | 2.982|  s | t |
|58.1 | 2.982|  s | t |
|83.1 | 2.855|  s | t |
|73.5 | 3.027|  s | t |
|60   | 2.76 |  s | t |
|80.1 | 3.267|  s | t |
|88.3 | 2.862|  s | t |
|80.7 | 2.687|  s | t |
|86.3 | 3.775|  s | t |
|82.4 | 3.027|  s | t |
|83.4 | 3.027|  s | t |
|86.3 | 3.775|  s | t |
|75.5 | 3.027|  s | t |
|74   | 3.027|  s | t |
|74.8 | 3.267|  s | t |
|74.4 | 2.897|  g | m |
|78.7 | 2.897|  g | m |
|73.6 | 2.897|  g | m |
|86.9 | 3.215|  g | m |
|76.1 | 2.897|  g | m |
|77   | 2.897|  g | m |
|75.7 | 2.897|  g | m |
|56.2 | 2.897|  g | m |
|65.9 | 2.897|  g | m |
|75.8 | 3.215|  g | m |
|85.1 | 3.215|  g | m |
|68.9 | 3.215|  g | m |
|86.9 | 3.215|  g | m |
|78.4 | 3.215|  g | m |
|77.2 | 3.227|  s | m |
|69.8 | 3.227|  s | m |
|76.5 | 3.127|  s | m |
|90   | 3.185|  s | m |
|83.5 | 3.155|  s | m |
|70.2 | 3.185|  s | m |
|81.1 | 3.185|  s | m |
|73.7 | 3.155|  s | m |
|84.1 | 3.185|  s | m |
|74.1 | 3.155|  s | m |
|58.5 | 2.5  |  s | m |
|41.1 | 2.5  |  s | m |
|63.9 | 2.5  |  s | m |
|61.1 | 2.5  |  s | m |
|81.1 | 2.875|  s | m |
|73.8 | 2.942|  s | m |
|84.8 | 2.867|  s | m |
|76.5 | 2.942|  s | m |
|52.5 | 2.942|  s | m |

As the title states, I have a dataset with one continuous DV (Y), one continuous IV (A), one categorical IV (B), and a blocking factor (C). I want to test if the relationship between Y and A is different for each category of B, taking into account the blocking factor C. I'm not sure what that exact test is, but I know I can disaggregate my data by group B and run a separate model for each group; so comparing Y~A of Bg and Y~A of Bs. My guess is I can do an ANOVA/ANCOVA so that I do not need to disaggregate my data and lose some of the sample size. (Also I'm highly confused how interaction effect should be used, and which variable should enter the model first?)

I've found a few methods for running ANOVAS with blocks. I've found that treating the blocking factor as random decreases both AIC and BIC, so ideally the method takes into account a random factor:

https://rcompanion.org/handbook/I_09.html a comprehensive walkthrough of an ANOVA using lmer

https://www.datanovia.com/en/lessons/ancova-in-r/ a good guide using lm, but no blocking term help

https://www.statology.org/ancova-in-r/ simple, but no blocking term help

https://www.r-bloggers.com/2013/01/formulae-in-r-anova-and-other-models-mixed-and-fixed/ blocking term accounting using the aov function

So I have a few methods here. I'll write the code and show outputs

#lmer method
library(lme4)
library(lmerTest)
library(car)
#model with no interaction
>model1 = lmer(Y ~ B + A +(1|C), REML = T, data = data)
#alternate model with interaction effect
>model2 = lmer(Y ~ B*A +(1|C), REML = T, data = data)
>summary(model1)
Linear mixed model fit by REML. t-tests use Satterthwaite''s method ['lmerModLmerTest']
Formula: Y ~ B + A + (1 | C)
   Data: data
REML criterion at convergence: 475.6
Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.77136 -0.51061  0.00536  0.64664  1.99431
Random effects:
 Groups   Name        Variance Std.Dev.
 C        (Intercept) 39.00    6.245

 Residual             84.08    9.170

Number of obs: 67, groups:  C, 2
Fixed effects:
            Estimate Std. Error      df t value Pr(>|t|)

(Intercept)    5.336     18.305  43.324   0.291 0.772068

Bs           -10.466      4.092  62.851  -2.557 0.012970 *

A             25.839      6.716  61.567   3.848 0.000286 ***

Signif. codes:  0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
            (Intr)   Bs
Bs           0.748

A           -0.965 -0.822
>summary(model2)
Linear mixed model fit by REML. t-tests use Satterthwaite''s method ['lmerModLmerTest']
Formula: Y ~ B * A + (1 | C)
   Data: data
REML criterion at convergence: 468.9
Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.74515 -0.51858  0.01236  0.63517  1.98255
Random effects:
 Groups   Name        Variance Std.Dev.
 C        (Intercept) 38.74    6.224

 Residual             85.44    9.243

Number of obs: 67, groups:  C, 2
Fixed effects:
                Estimate Std. Error       df t value Pr(>|t|)

(Intercept)       6.3453    23.0048  49.3777   0.276  0.78383

Bs              -12.6877    33.1317  62.3193  -0.383  0.70306

A                25.4562     8.5438  60.1365   2.979  0.00416 **
Bs:A              0.7725    11.3505  62.5953   0.068  0.94596

Signif. codes:  0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
            (Intr)  Bs      A 
Bs          -0.519

A           -0.978  0.525

Bs:A         0.598 -0.992 -0.611

From the output of model 1, I can see that there is a significance in both A and B, while model 2 only a significance in A

Here's where confusion #1 starts: I can run either anova() or Anova() for both models. lmerTest::anova() gives me a type III anova, while car::Anova() gives a type II Analysis of Deviance table.

>anova(model1)
Type III Analysis of Variance Table with Satterthwaite''s method
             Sum Sq Mean Sq NumDF  DenDF F value    Pr(>F)    
B            549.96  549.96     1 62.851  6.5406 0.0129696 *  
A           1244.85 1244.85     1 61.567 14.8047 0.0002858 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>Anova(model1)
Analysis of Deviance Table (Type II Wald chisquare tests)
Response: Y
              Chisq Df Pr(>Chisq)

B            6.5406  1  0.0105442 *

A           14.8047  1  0.0001192 ***

Signif. codes:  0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>anova(model2)
Type III Analysis of Variance Table with Satterthwaite''s method
               Sum Sq Mean Sq NumDF  DenDF F value    Pr(>F)

B               12.53   12.53     1 62.319  0.1466 0.7030619

A             1240.50 1240.50     1 60.892 14.5188 0.0003256 ***
B:A              0.40    0.40     1 62.595  0.0046 0.9459552

Signif. codes:  0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>Anova(model2)
Analysis of Deviance Table (Type II Wald chisquare tests)
Response: Y
                Chisq Df Pr(>Chisq)

B              6.4223  1  0.0112698 *

A             14.5504  1  0.0001365 ***
B:A            0.0046  1  0.9457383

Signif. codes:  0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Why the difference in significance for B for model 2 (interaction model) between summary(), car::Anova(), and lmerTest::anova(). Which do I trust? To further complicate things, let's look at the stats::aov() method.

model1 = aov(Y ~ B + A + Error(C), data = data)
model2 = aov(Y ~ B*A + Error(C), data = data)
>summary(model1)
Error: C
  Df Sum Sq Mean Sq
B  1  87.39   87.39
Error: Within
            Df Sum Sq Mean Sq F value   Pr(>F)

B            1     95    94.6   1.125 0.292987

A            1   1325  1324.7  15.755 0.000187 ***
Residuals   63   5297    84.1

Signif. codes:  0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>summary(model2)
Error: C
  Df Sum Sq Mean Sq
B  1  87.39   87.39
Error: Within
              Df Sum Sq Mean Sq F value   Pr(>F)

B              1     95    94.6   1.107 0.296881

A              1   1325  1324.7  15.505 0.000211 ***
B:A            1      0     0.0   0.000 0.993461

Residuals     62   5297    85.4

Signif. codes:  0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

So here the models agree on the significance of B with exact p-values. Now the issue here is that both stats::anova() and car::Anova() will not handle these models, as is the method in the Statology link. Adding the block as an error term makes this unviable. The Error message for both is:

Error in UseMethod("vcov") : 
  no applicable method for 'vcov' applied to an object of class "c('aovlist', 'listof')"

Now our last method, lme

library(nlme)
>model1 = lme(Y ~ B + A, random = ~1|C, method = "REML", data = data)
>model2 = lme(Y ~ B*A, random = ~1|C, method = "REML", data = data)
>summary(model1)
Linear mixed-effects model fit by REML
  Data: data 
       AIC      BIC    logLik
  485.6355 496.4299 -237.8178
Random effects:
 Formula: ~1 | C
        (Intercept) Residual
StdDev:    6.245355 9.169755
Fixed effects:  Y ~ B + A 
                Value Std.Error DF   t-value p-value
(Intercept)   5.33563 18.304636 63  0.291491  0.7716
Bs          -10.46557  4.092189 63 -2.557451  0.0130
A            25.83920  6.715509 63  3.847691  0.0003
 Correlation: 
            (Intr) Bs

Bs           0.748

A           -0.965 -0.822
Standardized Within-Group Residuals:
         Min           Q1          Med           Q3          Max 
-2.771357518 -0.510609057  0.005356615  0.646641593  1.994309873
Number of Observations: 67
Number of Groups: 2
>summary(model2)
Linear mixed-effects model fit by REML
  Data: data 
       AIC      BIC    logLik
  480.9425 493.8013 -234.4713
Random effects:
 Formula: ~1 | C
        (Intercept) Residual
StdDev:    6.224159 9.243403
Fixed effects:  Y ~ B * A 
                     Value Std.Error DF    t-value p-value
(Intercept)       6.345383  23.00476 62  0.2758291  0.7836
Bs              -12.687705  33.13166 62 -0.3829481  0.7031
A                25.456188   8.54384 62  2.9794790  0.0041
Bs:A              0.772521  11.35050 62  0.0680606  0.9460
 Correlation: 
                (Intr)  Bs     A
Bs              -0.519

A               -0.978  0.525

Bs:A             0.598 -0.992 -0.611
Standardized Within-Group Residuals:
        Min          Q1         Med          Q3         Max 
-2.74514990 -0.51857689  0.01236403  0.63517280  1.98254600
Number of Observations: 67
Number of Groups: 2
> anova(model1)
            numDF denDF   F-value p-value
(Intercept)     1    63 270.07347  <.0001
B               1    63   1.13754  0.2902
A               1    63  14.80473  0.0003
> anova(model2)
              numDF denDF   F-value p-value
(Intercept)       1    62 271.53755  <.0001
B                 1    62   1.11976  0.2941
A                 1    62  14.55037  0.0003
B:A               1    62   0.00463  0.9460
> Anova(model1)
Analysis of Deviance Table (Type II tests)
Response: Y
              Chisq Df Pr(>Chisq)

B            6.5406  1  0.0105442 *

A           14.8047  1  0.0001192 ***

Signif. codes:  0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> Anova(model2)
Analysis of Deviance Table (Type II tests)
Response: Y
                Chisq Df Pr(>Chisq)

B              6.4223  1  0.0112698 *

A             14.5504  1  0.0001365 ***
B:A            0.0046  1  0.9457374

Signif. codes:  0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Here, the different anova()/Anova() tables disagree with each other and the summary() outputs. an anova.lme() for both models agrees with the anova() output. Also, rather than an analysis of variance table in anova(), there's an analysis of deviance table. I can't find a concise answer on the difference between these types of analysis.

This is just the tip of the iceberg on these methods, and it's also limited by my understanding of both R and statistics. From what I understand, a lot of these methods differ by either a Type I, Type II, or Type III ANOVA, as well as how lmer vs lme operate. Completely different results from lmer() and lme() lme() and lmer() giving conflicting results

Any help as to which method to pick would be helpful.

R: What's the correct way to run an ANOVA with a continuous DV, one continuous IV, one categorical IV, and a blocking factor?

0 Answers0