Relationship between R^2 and sum of squared errors in non-linear models

Question

I'm reading from different sources (whuber's answer on R^2, another source) that when using R^2 one needs to be careful with regard to interpretation - both in linear and non-linear models.

In linear models R^2 makes sense, because one has the relationship $S_\text{tot}=S_\text{reg}+S_\text{error}$, but in non-linear regression this relationship is no longer valid. I wonder - why is this relationship no longer valid in non-linear regression? Is there an intuitive explanation for this, or is it purely mathematical?

Reading from this discussion, I quote:

"There is a good reason that an nls model fit in R does not provide r-squared - r-squared doesn't make sense for a general nls model.

One way of thinking of r-squared is as a comparison of the residual sum of squares for the fitted model to the residual sum of squares for a trivial model that consists of a constant only. You cannot guarantee that this is a comparison of nested models when dealing with an nls model. If the models aren't nested this comparison is not terribly meaningful."

Can one not have a non-linear nested model, such that this comparison is meaningful? Is there any case in non-linear regression where R^2 will be meaningful?

Dave · Answer 1 · 2022-10-23T16:04:23.090

I completely disagree that $R^2$ lacks meaning in nonlinear models.

When you minimize square loss, whether the model is linear or not, you are estimating conditional means. What better baseline that you must beat to have a model worth anything than predicting the conditional means all to be the overall mean? If all of your fancy modeling still gets outperformed by predicting AVERAGE(A:A) (to use some Excel terminology) every time, I think it's safe to say that your modeling is quite poor.

If you take $R^2$ to be the following, that's exactly what you are doing.

$$ 1-\dfrac{ \overset{n}{\underset{i=1}{\sum}}\left( y_i - \hat y_i \right)^2 }{ \overset{n}{\underset{i=1}{\sum}}\left( y_i - \bar y \right)^2 } $$

If the numerator of the fraction is smaller than the denominator, equivalent to having a positive $R^2$ value, then you've at least made some improvement over the bare minimum baseline. If the numerator of the fraction is larger than the denominator, equivalent to having a negative $R^2$ value, then the baseline model outperforms your model.

Nested models have some nice properties when you want to do parameter inference, but for assessing your predictions with the above $R^2$ formula, I see no reason to require that.

I find this perspective on $R^2$ to be elegant, as it relates to ways of comparing other types of models ("classification" through McFadden's $R^2$, quantile regression) and even to "chunk testing", such as ANOVA and similar partial F-tests.

For an out-of-sample $R^2$-type of performance measure, I have a strong opinion.

For why nonlinear regression $R^2$ lacks the usual "proportion of variance explained" interpretation, I posted a question and self-answer about a year ago to address this very issue.

Relationship between R^2 and sum of squared errors in non-linear models

1 Answers1

Linked