MSE is 'scale dependent'. R-squared seems a better measure of fit for regressions. Are there others?

Question

Mean-Squared Error is scale dependent. For example if I have an MSE of 0.1 and multiply all of X and Y by 100, redo my regression and calculate MSE, I get an MSE of 1000.0. ((y_true-y_regr)^2 ---> 100^2*(y_true-y_regr)^2)

Whilst MSE is very useful/powerful and has its own meaning, the fact in and of itself that its value is large or small does not necessarily give meaning in 'goodness of fit' in and of itself. Thus it is integral but has a slightly different interpretation.

R-squared has its own pros/cons but seems a better measure and 'normalised' to the data itself. Are there other 'universal' (non-scale dependent etc.) measures of 'goodness of fit'?

Would correlations or even mutual information between y_true and the predicted regression y_regr be useful for how well a regression of any type, including neural networks etc., fits the data it is trying to predict?

MSE is valuable precisely because it is scale dependent: that is what gives it meaning. For instance, an $R^2$ of $0.999999$ can be useless in many circumstances because by itself it tells you nothing about how close the model's predictions are to the data. (I have seen truly terrible scientific models with $R^2$ this high.) But if your objective is to fit a model of, say, human heights and you achieve an MSE of 1 cm, you immediately know how well you're doing. — whuber, Mar 22 '22 at 21:29
Oh yes, definitely. I'm not diminishing it's power/use meaning or that any one metric should be used alone. More that MSE does one job excellently, I am a little less clear on others best suited regarding a goodness of fit. (I have data right now and a regression that has a very very low MSE, which seems good, but plotting it's predictions for the timeseries it has learned predicts timing correctly but the predicted magnitude from the regressions is often 60% smaller than y_true.) Thus I have a high MSE but low r-squared. — Socorro, Mar 22 '22 at 21:39
Goodness of fit, broadly understood, is usually assessed by comparing one's fit to more flexible alternatives. ($R^2$ fits this description as the goodness of fit of a constant model, where your model is the flexible alternative!) A great number of GoF tests have been devised along these lines. For instance, in ordinary least squares regression some textbook authors encourage their readers to throw in some quadratic terms: if these don't "significantly" improve the fit, the original fit is considered to be "good." — whuber, Mar 22 '22 at 21:42
Note that $R^2 = 1-\dfrac{n\times MSE}{\sum_{i=1}^n\big(y_i - \bar y\big)^2}$, so $R^2$ and $MSE$ kind of have the same information (in some sense). In particular, on the same data, any model with higher $R^2$ than an competitor (ranging from simple linear regression to support vector regression to deep learning) will have lower $MSE$ than that competitor. — Dave, Mar 22 '22 at 21:52
@Dave Your point is about relative comparability of $R^2,$ and indeed that's worth remembering. But an important qualification is needed: "on the same data, expressed in the same way." We need to rule out changes in models arising from nonlinear transformations of the $y_i.$ — whuber, Mar 22 '22 at 22:01
Thanks. The current ‘model’ is stationary data.
I'll be moving to neural networks (NNs), seeing if they can predict this behaviour. I have looked if r-squared values are an appropriate companion to MSE on stack exchange. I have discovered in statistical commentary that quite a bit involves pointing to increasingly complex niches of assumptions that are violated, the more you search, as to why metric/approach X is not perfect, vs. if it's a building block/constructive in some way.

R2 seems my best metric/guess so far, I don't know if there arise extra complications in its meaning to NNs. — Socorro, Mar 22 '22 at 22:17

score 1 · Answer 1 · answered Mar 25 '22 at 17:40

I would argue that $R^2$ is not a universal "grade" for a model where $R^2=90\%=0.9$ is an $\text{A}$ that makes us happy and $R^2=40\%=0.4$ is an $\text{F}$ that makes us sad. It might be that $R^2=0.4$ is excellent performance for a task or that $R^2=0.9$ is rather pedestrian for a different task.

In that sense, I do not believe there to be an easy loss function that grades your model quality. To say that, you must know the costs of making wrong predictions. If you make a prediction that misses the true value by $3$, put that in context. If that is three meters when you are trying to measure how far away another town is, then I'd say that's pretty good. If you're trying to measure how tall someone is, such performance does not sound so good.

Likewise, if you get $R^2 = 0.9$, put that in context. If you have a large reduction in variance, it could be because you have small errors, or it could be that the original data had a huge variance, so even reducing the variance to be merely "large" gives you a high $R^2$, even though you need to have "modest" errors to have a useful model.

score -1 · Answer 2 · answered Mar 25 '22 at 16:09

-1

RMSE tells us how far the model residuals are from zero on average, i.e. the average distance between the observed values and the predicate values. However, Willmott et. al. suggested that RMSE might be misleading to assess the model performance since RMSE is a function of the average error and the distribution of squared errors. Chai recommended to use both RMSE and mean absolute error (MAE). It is better to report both metrics. By the way, $R^2$ is misleading as well since it increases for higher number of predictors. I would recommend to use adjusted $R^2$. This metric is kinda gold standard goodness of fit test. Attached articles will give u more explanations.

Reference:

http://www.jstor.org/stable/24869236

https://gmd.copernicus.org/articles/7/1247/2014/

answered Mar 25 '22 at 16:09

ForestGump

327

-1 $1)$ Given that $R^2$ is a function of (R)MSE, any advocating for a model evaluation using (R)MSE instead of $R^2$ is logically inconsistent. $R^2$ and (R)MSE are, in some sense, just different units of the same measurement (such as feet vs meters for a length measurement). While you can drive your $R^2$ up to $1$ by overfitting the data (e.g., connect the dots), doing so also involves driving (R)MSE down to zero. – Dave Mar 25 '22 at 16:23
1

$2)$ RMSE is not equal to "the average distance between the observed values and the predicate values" or "how far the model residuals are from zero on average", both of which are the mean absolute error (MAE). – Dave Mar 25 '22 at 16:25
@Dave $R^2$ doesn't impose penalty for large number of predictors whereas adjusted $R^2$ does. I would prefer adjusted $R_2$ to know how much variance is captures by my model. – ForestGump Mar 25 '22 at 16:27
This does not appear to be a situation in which the number of explanatory variables is in question; if it were, the use of adjusted $R^2$ would certainly be an improvement. This is another one of those difficult questions that really need clarification to be answered. In this case, much of the question formulation relies on strange characterizations of what $R^2$ and MSE do as well as vague references to "goodness of fit" without further elaboration of what that is intended to mean. – whuber Mar 25 '22 at 16:41
There are lots of methods for penalizing a large number of predictors. Adjusted $R^2$ is one, various information criteria do the trick, and out-of-sample testing is possible. In any event, arguing for RMSE in the beginning of your paragraph and then against $R^2$ later represents logical incompatibility. If you want to penalize the model for having a large number of parameters that might not matter, go ahead and apply a penalty, but your post alludes to a common misconception that (R)MSE should be used, instead of $R^2$, since the latter can be driven to perfection by overfitting the data. – Dave Mar 25 '22 at 16:43
$R^2$ is a useful quadratic measure. For a useful linear measure (though one that doesn't not partitition by sources of explained variation as $R^2$ does) consider the $g$-index based on Gini's mean difference of predicted values, discussed here. This index could be normalized by the Gini's mean difference of $Y$ to obtain a unitness index. – Frank Harrell Dec 27 '22 at 10:36

MSE is 'scale dependent'. R-squared seems a better measure of fit for regressions. Are there others?

2 Answers2

Linked