Why do non-linear models like randomForest() and h2o.gbm() have the $R^2$ as one of evaluation metrics

Question

The models I used for predicting cost are randomForest(),h20.gbm(),glmnet() and lm(). I'm thinking to compare these models using the same metric. At first, I used $R^2$, but then I realized it is a good only for the linear regression. For the non-linear regression, such as randomForest() and h2o.gbm(), $R^2$ is not good to evaulate because SSE + SSR may not equal to SST. However, the results of randomForest() and h20.gbm() still have $R^2$.

>RF$rsq
 [1] 0.4014412 0.4001227 0.4257454 0.4655343 0.4853580 0.5056654 0.5155696 0.5368283 0.5554481 0.5753047 0.5870088 0.5974284 0.6080594
[14] 0.6181469 0.6236067 0.6288154 0.6330016 0.6376875 0.6409845 0.6441638

>gbm

Cross-Validation Metrics Summary: 
                             mean           sd  cv_1_valid cv_2_valid  cv_3_valid cv_4_valid cv_5_valid cv_6_valid  cv_7_valid
mae                     0.3160167 0.0050183255  0.30723402 0.31874081   0.3227441 0.32830733 0.30606982 0.30992532  0.32118118
mean_residual_deviance 0.19114475 0.0056184703  0.18364981 0.20013429  0.20293528  0.1948082 0.18348551 0.18217574  0.19370228
mse                    0.19114475 0.0056184703  0.18364981 0.20013429  0.20293528  0.1948082 0.18348551 0.18217574  0.19370228
r2                      0.7201382  0.009881634    0.742593 0.71306866   0.7103253 0.71332353 0.74646896  0.7218749   0.6966323
residual_deviance      0.19114475 0.0056184703  0.18364981 0.20013429  0.20293528  0.1948082 0.18348551 0.18217574  0.19370228
rmse                   0.43710622  0.006437773  0.42854384  0.4473637  0.45048338 0.44137082  0.4283521  0.4268205  0.44011623
rmsle                  0.09950914 0.0019040282 0.098211296 0.10077556 0.105873734 0.10044188 0.09536128   0.096751 0.099039644
                       cv_8_valid cv_9_valid cv_10_valid
mae                    0.32118884   0.310518  0.31425762
mean_residual_deviance 0.19704889 0.17891628  0.19459121
mse                    0.19704889 0.17891628  0.19459121
r2                      0.7196735 0.71751463   0.7199072
residual_deviance      0.19704889 0.17891628  0.19459121
rmse                    0.4439019 0.42298496  0.44112495
rmsle                  0.09868799  0.0990525  0.10089653

I'm confused why these two models still have the $R^2$ as one of metrics to evaluate the model and I don't know if I should use it. I've checked the documentation of randomForest(), it says "rsq (regression only) “pseudo R-squared”: 1 - mse / Var(y)." But there's nothing described in the documentation of h2o.gbm().

R-squared (R2) is exact for straight lines and flat surfaces, and is both approximate and useful for other models. I personally calculate R-squared as "R2 = 1.0 - (regression_error_variance / dependent_data_variance)" and use it to tell me what fraction of the dependent data variance is explained by the model. For example, if R2 = 0.95, I interpret this to mean that 95 percent of the dependent data variance is explained by the fitted model. — James Phillips, Jun 25 '19 at 18:38

score 1 · Answer 1 · answered Jun 26 '19 at 10:01

1

It's simply plain wrong to use R-squared as an evaluation metric for non-linear regression. It's assumptions rely on the fact that regression model is linear. I don't know how this practice started in the R community but it should not be propagated further.

Source: An evaluation of R2 as an inadequate measure for nonlinear models in pharmacological and biochemical research: a Monte Carlo approach(365 citations)

answered Jun 26 '19 at 10:01

Divyansh

21

1

AdamO and I contest this stance here, where my answer shows the relationship between $R^2$ and square loss. Further discussion of mine can be found here. – Dave May 01 '23 at 19:14

Dave · Accepted Answer · 2023-05-01T19:15:49.830

It's because practitioners are familiar with $R^2$ and use what they know. The paper linked by Divyansh has a pretty scathing comment: "This observation might be due to differences in the mathematical background of trained statisticians and biochemists/pharmacologists who often apply statistical methods but lack detailed statistical insight." (And that's the polite version that made it to the publication!)

Nonetheless, some practitioners use $R^2$, so a developer of a package must include it if she wants her package to be used by those people. There's a lot of research in biology and medicine, so that would be an awful lot of potential users to alienate by excluding their favorite metric. Further, it may just be a requirement for publication, so people who know better just have to report it in their papers, even when they use more appropriate metrics.

Finally, despite potential issues with $R^2$, there is a sense in which $R^2$ is totally legitimate and equivalent to square loss (sum of squared errors, mean squared error, and root mean squared error): Interpreting nonlinear regression $R^2$.

Why do non-linear models like randomForest() and h2o.gbm() have the $R^2$ as one of evaluation metrics

2 Answers2