I'm trying to figure out why the anova function in R gives me the same results (for the p-value) regardless of the order of the models.
> anova(lm.fit ,lm.fit2)
Analysis of Variance Table
Model 1: medv ~ lstat
Model 2: medv ~ lstat + I(lstat^2)
Res.Df RSS Df Sum of Sq F Pr(>F)
1 504 19472
2 503 15347 1 4125.1 135.2 < 2.2e-16 ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> anova(lm.fit2,lm.fit)
Analysis of Variance Table
Model 1: medv ~ lstat + I(lstat^2)
Model 2: medv ~ lstat
Res.Df RSS Df Sum of Sq F Pr(>F)
1 503 15347
2 504 19472 -1 -4125.1 135.2 < 2.2e-16 ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
I can't understand why the p-value is so low in both cases. The way I'm understanding I should interpret the result of the anova is that the model 2 is better than model 1 if the p-value is very low, but in this case I'm getting exactly the same no matter the order.
I'm trying to read ?anova to check what this all means, but the help page is very succinct, is there another help where it states what the Df parameter means for instance?
anovais expected to order the arguments in a meaningful way. – whuber Jul 02 '15 at 22:08