Why do anova(lm( )) and summary(lm( )) (in R) yield different p-values for a factor f with two levels?
The model is y ~ f + c, f has two levels (unequal sample size) and c is numeric.
set.seed(123)
df <- data.frame(
y = rnorm(1000),
f = sample(c("a","b"), 1000, replace=TRUE, prob=c(.2,.8)),
c = rnorm(1000)
)
fit <- lm(y ~ f + c, data = df);
anova(fit) # p = 0.54553
summary(fit) # p = 0.5191
library(emmeans)
emmeans(fit, pairwise ~ f) # p = 0.5191
edit: in addition...
library(car)
Anova(fit) # p = 0.5191
```
library(car); Anova(fit)instead ofanova(fit), you'll get the same p-value as in the other methods. – Sal Mangiafico Dec 18 '22 at 19:07fit2 <- lm(y ~ c + f, data = df); you'll get the same results for thefp-value as withcar::Anova()oremmeans()orsummary(). That forcis now different. With unbalanced designs the Type I ANOVA done byanova()isn't to be trusted. See this page. – EdM Dec 18 '22 at 20:31car::Anova(). – EdM Dec 18 '22 at 22:07