Mean-Squared Error is scale dependent. For example if I have an MSE of 0.1 and multiply all of X and Y by 100, redo my regression and calculate MSE, I get an MSE of 1000.0. ((y_true-y_regr)^2 ---> 100^2*(y_true-y_regr)^2)
Whilst MSE is very useful/powerful and has its own meaning, the fact in and of itself that its value is large or small does not necessarily give meaning in 'goodness of fit' in and of itself. Thus it is integral but has a slightly different interpretation.
R-squared has its own pros/cons but seems a better measure and 'normalised' to the data itself. Are there other 'universal' (non-scale dependent etc.) measures of 'goodness of fit'?
Would correlations or even mutual information between y_true and the predicted regression y_regr be useful for how well a regression of any type, including neural networks etc., fits the data it is trying to predict?
I'll be moving to neural networks (NNs), seeing if they can predict this behaviour. I have looked if r-squared values are an appropriate companion to MSE on stack exchange. I have discovered in statistical commentary that quite a bit involves pointing to increasingly complex niches of assumptions that are violated, the more you search, as to why metric/approach X is not perfect, vs. if it's a building block/constructive in some way.
R2 seems my best metric/guess so far, I don't know if there arise extra complications in its meaning to NNs.
– Socorro Mar 22 '22 at 22:17