The models I used for predicting cost are randomForest(),h20.gbm(),glmnet() and lm(). I'm thinking to compare these models using the same metric. At first, I used $R^2$, but then I realized it is a good only for the linear regression. For the non-linear regression, such as randomForest() and h2o.gbm(), $R^2$ is not good to evaulate because SSE + SSR may not equal to SST.
However, the results of randomForest() and h20.gbm() still have $R^2$.
>RF$rsq
[1] 0.4014412 0.4001227 0.4257454 0.4655343 0.4853580 0.5056654 0.5155696 0.5368283 0.5554481 0.5753047 0.5870088 0.5974284 0.6080594
[14] 0.6181469 0.6236067 0.6288154 0.6330016 0.6376875 0.6409845 0.6441638
>gbm
Cross-Validation Metrics Summary:
mean sd cv_1_valid cv_2_valid cv_3_valid cv_4_valid cv_5_valid cv_6_valid cv_7_valid
mae 0.3160167 0.0050183255 0.30723402 0.31874081 0.3227441 0.32830733 0.30606982 0.30992532 0.32118118
mean_residual_deviance 0.19114475 0.0056184703 0.18364981 0.20013429 0.20293528 0.1948082 0.18348551 0.18217574 0.19370228
mse 0.19114475 0.0056184703 0.18364981 0.20013429 0.20293528 0.1948082 0.18348551 0.18217574 0.19370228
r2 0.7201382 0.009881634 0.742593 0.71306866 0.7103253 0.71332353 0.74646896 0.7218749 0.6966323
residual_deviance 0.19114475 0.0056184703 0.18364981 0.20013429 0.20293528 0.1948082 0.18348551 0.18217574 0.19370228
rmse 0.43710622 0.006437773 0.42854384 0.4473637 0.45048338 0.44137082 0.4283521 0.4268205 0.44011623
rmsle 0.09950914 0.0019040282 0.098211296 0.10077556 0.105873734 0.10044188 0.09536128 0.096751 0.099039644
cv_8_valid cv_9_valid cv_10_valid
mae 0.32118884 0.310518 0.31425762
mean_residual_deviance 0.19704889 0.17891628 0.19459121
mse 0.19704889 0.17891628 0.19459121
r2 0.7196735 0.71751463 0.7199072
residual_deviance 0.19704889 0.17891628 0.19459121
rmse 0.4439019 0.42298496 0.44112495
rmsle 0.09868799 0.0990525 0.10089653
I'm confused why these two models still have the $R^2$ as one of metrics to evaluate the model and I don't know if I should use it. I've checked the documentation of randomForest(), it says "rsq (regression only) “pseudo R-squared”: 1 - mse / Var(y)." But there's nothing described in the documentation of h2o.gbm().