Calculating MSE: why are these two ways giving different results?

Question

I am having some doubt regarding the calculation of MSE in R.

I have tried two different ways and I am getting two different results. Wanted to know which one is the correct way of finding mse.

First:

model1 <- lm(data=d, x ~ y)
rmse_model1 <- mean((d - predict(model1))^2)

Second:

mean(model1$residuals^2)

Thanks! Yes it indeed gives the same result. – Julius Knafl Apr 02 '17 at 02:55 — Julius Knafl, Apr 02 '17 at 02:55

score 3 · Accepted Answer · edited May 23 '17 at 12:25

In principle, they should give you the same result. But in the first option, you should use d$x. If you just use d, recycling rule in R will repeat predict(model1) twice (as d has two columns) and the computation will also involve d$y.

Note that it is recommended to include na.rm = TRUE to mean, and newdata = d to predict in the first option. This makes your code robust to missing values in your data. On the other hand you don't need worry about NA in the second option, as lm automatically drops NA cases. You may have a look at this thread for potential effect of this feature: Aligning Data frame with missing values.

Calculating MSE: why are these two ways giving different results?

1 Answers1