0

Why do anova(lm( )) and summary(lm( )) (in R) yield different p-values for a factor f with two levels?

The model is y ~ f + c, f has two levels (unequal sample size) and c is numeric.

set.seed(123)

df <- data.frame( y = rnorm(1000), f = sample(c("a","b"), 1000, replace=TRUE, prob=c(.2,.8)), c = rnorm(1000) )

fit <- lm(y ~ f + c, data = df);

anova(fit) # p = 0.54553 summary(fit) # p = 0.5191

library(emmeans) emmeans(fit, pairwise ~ f) # p = 0.5191

edit: in addition...

library(car) Anova(fit) # p = 0.5191 ```

arb
  • 21
  • If you use library(car); Anova(fit) instead of anova(fit), you'll get the same p-value as in the other methods. – Sal Mangiafico Dec 18 '22 at 19:07
  • 1
    @SalMangiafico But why is that the case? – Dave Dec 18 '22 at 19:24
  • Try fit2 <- lm(y ~ c + f, data = df); you'll get the same results for the f p-value as with car::Anova() or emmeans() or summary(). That for c is now different. With unbalanced designs the Type I ANOVA done by anova() isn't to be trusted. See this page. – EdM Dec 18 '22 at 20:31
  • Oh my, I thought it defaulted to Type II – arb Dec 18 '22 at 22:01
  • 1
    You're clearly not the only one who thought that! Type II tests are default for car::Anova(). – EdM Dec 18 '22 at 22:07

0 Answers0