A Higher r-squared always implies a reduction in MAE and RMSE?

Question

I apply 2 different machine learning models in my data, a Multiple Linear Regression and Random Forest. The results were bellow:

Why the MAE and RMSE are higher for a higher R-squared? Both models were tested in the same test set but with different input varibales

I second the question from @utobi. Not everyone agrees about how to calculate $R^2$ in every situation. // MAE is a separate metric, and there is not expectation that it should move with measures of square loss. — Dave, Oct 15 '22 at 17:36
@Dave r2_score(test_labels,predictions) where the test_labels are my true values, and predictions the model prediction — Alice Silva, Oct 15 '22 at 17:42
Are you calculating all of MAE, RMSE, $R^2$ on the same data? — Stephan Kolassa, Oct 15 '22 at 18:47
@RichardHardy r2_score(test_labels,predictions) where the test_labels are my true values, and predictions the model prediction — Alice Silva, Oct 15 '22 at 19:28
@AliceSilva, that is still not an equation. A name of a function does not automatically tell me what exactly the function does. — Richard Hardy, Oct 15 '22 at 19:32
@RichardHardy 1 - residual sum of square / total sum of squares. — Alice Silva, Oct 15 '22 at 19:37
@Dave https://scikit-learn.org/stable/modules/generated/sklearn.metrics.r2_score.html you can see here the documentation of the metric — Alice Silva, Oct 15 '22 at 19:58
I’m familiar with the function. What are you inputting into it? — Dave, Oct 15 '22 at 20:22
@Dave i just remember that my variables inputs are not the same this can mean something? — Alice Silva, Oct 15 '22 at 20:24
@Dave i input the true values, and the predicted values by the model — Alice Silva, Oct 15 '22 at 20:25

Dave · Accepted Answer · 2022-10-18T01:09:41.273

What you’ve described can’t happen in math, so there’s either a missing detail, a bug in your code causing you to input something other than what you intend to input, or something wrong with the Python function (the last of which I find unlikely).

I disagree with this Python implementation of $R^2$, but for the same data set, $R^2_{1,sklearn}>R^2_{2,sklearn}\iff MSE_2>MSE_2\iff R^2_{1,Dave}>R^2_{2,Dave}$. This is because $R^2$, either in the implementation you use or that way I prefer, is a strictly decreasing function of MSE.

$$R^2=1-\dfrac{MSE}{denominator}$$

(This denominator is some kind of sum of squares that is related to a model that predicts the same value every time. Your function and I disagree on what the one value should be, but you could pick $5$ or $17$ or $\pi$ as the denominator, and MSE and that definition of $R^2$ should move in opposite directions.)

If you evaluate the $R^2$ of two different models but on the same data, the denominator stays the same. Thus, increasing/decreasing $R^2$ corresponds to decreasing/increasing MSE.

(If your implementation of an MSE calculation involves an $n-p$ denominator instead of $n$ or $n-1$, then the above does not apply. This could be the kind of missing detail I mentioned in the last sentence of my first paragraph.)

MAE is a totally different metric that need not increase/decrease with an increase/decrease in MSE or a decrease/increase in $R^2$. In this linked answer of mine, I give examples where an MSE increase/decrease is accompanied by an MAE decrease/increase.

Thank you soo much for the answer, can you just explain a little bit more about the MSE calculation involving n-p instead of n and n-1? What does the p means in this case and can you show that formula? — Alice Silva, Oct 15 '22 at 21:55
@AliceSilva the $p$ refers to the number of parameters in a linear regression model, and it can be thought of as a way to penalize using a large number of parameters. For further details, you might consider posting a new question so others can benefit from the answer, rather than burying important material in the comments of a fairly unrelated question. — Dave, Oct 18 '22 at 01:02

A Higher r-squared always implies a reduction in MAE and RMSE?

1 Answers1