Why the significance of linear regression is different from anova

Question

I am doing model selection and really confused about why the significance of the coefficient is different from ANOVA, which the coefficient is significant using the summary function, but when I put it into ANOVA, it shows it is not significant. Could you tell me which one should I follow. Here is my code:

model1 = lm(Income~Num_grad+Region+Degree+Program, data=train)
summary(model1)
anova(model1)

Here is the result of these two function

You might see the following question: stats.stackexchange.com/questions/599422/why-do-anovalm-and-summarylm-yield-different-p-values-for-a-factor-wit — Sal Mangiafico, Dec 18 '22 at 19:11

score 1 · Answer 1 · answered Dec 22 '22 at 16:42

There are a couple of things to consider here.

One is the issue in the question linked by Sal Mangiafico in a comment. With an unbalanced design the standard R anova() function can be very misleading, as it uses Type I sums of squares whose results depend on the order of variables in the formula. Essentially in this case, anova() tried to evaluate the importance of Num_grad without considering the information provided by all of the other predictors. You are better off using the Type II Wald-type test provided as default by the Anova() function of the R car package.

Second is the interpretation of the Num_grad coefficient in the list of coefficients. In this particular case with no interactions among predictors, its displayed significance should be the same as the Wald test. But in general when there are interactions, a single-predictor coefficient only represents the situation when all of its interacting predictors are at their own reference levels. That leads to a large amount of confusion if you aren't aware of the problem.

Why the significance of linear regression is different from anova

1 Answers1