0

I have a dataset with approximately 2500 observations and 50 variables. The response variable is numerical, so my objective is to build a regression model. I have built one penalized linear regression model and one xgboost regressor model.

The linear model has obtained a MSE of 7.31 and a R2 score of 0.62 The xgboost model has obtained a MSE of 8.19 and a R2 of 0.66

So one model has the smallest MSE but the other has the largest R2. Which one is better? I have read that there are some metrics that are called "proper", meaning that it is mathematically proven that the better the metric, the better the model. I was wondering if either the MSE or R2 are proper.

  • You have to pay attention to the exact $R^2$ calculation in order to interpret its meaning and if it should be meaningful to you. Do you know what calculation is performed, even if it is just a software function? – Dave Jun 12 '23 at 15:33
  • Well, I am using scikit-learn's function r2_score, which receives a vector of true response values and a vector of predictions and from what I have seen, it computes the value as 1 - (residual sum of squares / total sum of squares) – Alberto Perez Martinez Jun 12 '23 at 16:50
  • That’s the exact function that I speculate in the linked question has no statistical motivation and is not meaningful beyond being an approximation to the formula I give at the end of that question. My stance now has support in the statistics literature, with the citation included in that linked question. – Dave Jun 12 '23 at 17:05
  • What do you input into the r2_score function for each model? – Dave Mar 28 '24 at 11:07

1 Answers1

0

There is an error in your calculation - or the two models were evaluated on different samples.

See the formula for MSE and $R^2$: What is the mathematical relationship between R2 and MSE?

chrishmorris
  • 1,780