2

I am having some doubt regarding the calculation of MSE in R.

I have tried two different ways and I am getting two different results. Wanted to know which one is the correct way of finding mse.

First:

model1 <- lm(data=d, x ~ y)
rmse_model1 <- mean((d - predict(model1))^2)

Second:

mean(model1$residuals^2)
Zheyuan Li
  • 62,170
  • 17
  • 162
  • 226
Julius Knafl
  • 427
  • 3
  • 14

1 Answers1

3

In principle, they should give you the same result. But in the first option, you should use d$x. If you just use d, recycling rule in R will repeat predict(model1) twice (as d has two columns) and the computation will also involve d$y.

Note that it is recommended to include na.rm = TRUE to mean, and newdata = d to predict in the first option. This makes your code robust to missing values in your data. On the other hand you don't need worry about NA in the second option, as lm automatically drops NA cases. You may have a look at this thread for potential effect of this feature: Aligning Data frame with missing values.

Community
  • 1
  • 1
Zheyuan Li
  • 62,170
  • 17
  • 162
  • 226