Why do anova(lm( )) and summary(lm( )) yield different p-values for a factor with two levels?

Asked Dec 18 '22 at 13:28

Active Dec 18 '22 at 20:17

Viewed 49 times

Why do anova(lm( )) and summary(lm( )) (in R) yield different p-values for a factor f with two levels?

The model is y ~ f + c, f has two levels (unequal sample size) and c is numeric.

set.seed(123)
df <- data.frame(
  y = rnorm(1000),
  f = sample(c("a","b"), 1000, replace=TRUE, prob=c(.2,.8)),
  c = rnorm(1000)
)
fit <- lm(y ~ f + c, data = df);
anova(fit)                 # p = 0.54553
summary(fit)               # p = 0.5191
library(emmeans)
emmeans(fit, pairwise ~ f) # p = 0.5191
edit: in addition...
library(car)
Anova(fit)                 # p = 0.5191
```

edited Dec 18 '22 at 20:17

asked Dec 18 '22 at 13:28

arb

If you use library(car); Anova(fit) instead of anova(fit), you'll get the same p-value as in the other methods. – Sal Mangiafico Dec 18 '22 at 19:07
1

@SalMangiafico But why is that the case? – Dave Dec 18 '22 at 19:24
Try fit2 <- lm(y ~ c + f, data = df); you'll get the same results for the f p-value as with car::Anova() or emmeans() or summary(). That for c is now different. With unbalanced designs the Type I ANOVA done by anova() isn't to be trusted. See this page. – EdM Dec 18 '22 at 20:31
Oh my, I thought it defaulted to Type II – arb Dec 18 '22 at 22:01
1

You're clearly not the only one who thought that! Type II tests are default for car::Anova(). – EdM Dec 18 '22 at 22:07

Why do anova(lm( )) and summary(lm( )) yield different p-values for a factor with two levels?

edit: in addition...

0 Answers0

Linked