4

I am using a random forest regression model to make predictions and leave one out cross validation for my prediction task. I am having a difficult time understanding why my R2 score is negative when the MSE, RMSE, and MAE are all very low. Here I am providing a sample of my true and predicted values:

True Value: 0.0511350891441389, Predicted Value: 0.1570743965948912
True Value: 0.1019683613090206, Predicted Value: 0.06101801962025982
True Value: 0.0722484077136202, Predicted Value: 0.12989937556879136
True Value: 0.8151465997429149, Predicted Value: 0.11910986913415476
True Value: 0.0141580461529044, Predicted Value: 0.10300264949635973
True Value: 0.0759365903712855, Predicted Value: 0.2007470535994329
True Value: 0.0168830791575889, Predicted Value: 0.0867039544973983
True Value: 0.0280480358233258, Predicted Value: 0.3334096609357363
True Value: 0.0119374073771543, Predicted Value: 0.0456333839555339
True Value: 0.0879195861169952, Predicted Value: 0.12158770472179008
True Value: 0.1877777777777777, Predicted Value: 0.1636636091524143
True Value: 0.1319864052287581, Predicted Value: 0.05390845919789602

These are the scores:

Mean Squared Error (MSE): 0.035323866926619006
Mean Absolute Error (MAE): 0.1288933724806987
Root Mean Squared Error (RMSE): 0.1879464469646048
R-squared (R2) Score: -0.4162881141285679

I am also providing my visualization of the actual vs. predicted values:

Actual v/s Predicted value plot

Rai
  • 43

2 Answers2

12

If you're getting $R^2<0$, then I assume you're using the equation below, which is the calculation used by sklearn.metrics.r2_score.

$$ R^2=1-\left(\dfrac{ \overset{N}{\underset{i=1}{\sum}}\left( y_i-\hat y_i \right)^2 }{ \overset{N}{\underset{i=1}{\sum}}\left( y_i-\bar y \right)^2 }\right) =1-\left(\dfrac{ N\times RMSE^2 }{ \overset{N}{\underset{i=1}{\sum}}\left( y_i-\bar y \right)^2 }\right) $$

This $R^2$ is a function of both the (R)MSE and the total sum of squares. Therefore, no matter how small that (R)MSE is, if the denominator is smaller than the numerator, you will get $R^2<0$.

The interpretation of $R^2<0$ is that your predictions have a higher MSE than that of a naïve model that always predicts the overall mean, $\bar y$. That is, the predictions are not very good. Based on the graph, that appears to be the case, likely driven by that point at the far right that your model misses badly.

Dave
  • 62,186
  • 1
    Given this insight, it's apparent that my current model might not be suitable for the dataset, especially since it seems highly sensitive to outliers. Considering the characteristics of my dataset, which is relatively small, and the sensitivity to outliers, I'm inclined to believe that I may need a more robust regression model that can handle such scenarios effectively. Could you kindly recommend a regression model or approach that is known for its robustness and is suitable for smaller datasets with potential outliers? Thank you once again for your help. – Rai Sep 20 '23 at 18:13
  • 3
    @Rai That's really a new question that warrants its own post, but it's almost asking how to do predictive modeling or machine learning. However, if you only have the $28$ points that I believe I counted, there is little hope of doing any serious predictive modeling. – Dave Sep 20 '23 at 18:14
  • Yes, I understand. Thank you for your help! – Rai Sep 20 '23 at 18:22
  • @Rai If this answered your question, please consider upvoting and accepting the answer to "close out" the question. – Dave Sep 20 '23 at 18:24
  • Well... even with few points one can get good predictive modeling when those points are highly reliable and things are fairly 'well behaved' between those points and the dimension is low. Perhaps with something like Kriging or Laplace Interpolation. It might not be the case here, but I wouldn't rule out getting a good predictive model only based on having few points. If your outliers are due to random factors rather than a genuine trend in the data, then you may be very out of luck though. If you don't trust outliers, look to regularized models. E.g. if using Xgboost, increase lambda. – ttbek Sep 21 '23 at 11:11
  • Random forests are notorious for overfitting, leading to calibration curves that are far from the line of identity. I would never trust RF when absolute accuracy is of interest. – Frank Harrell Sep 21 '23 at 11:48
  • 1
    @FrankHarrell What do you mean by "absolute accuracy" as opposed to "accuracy"? – Dave Sep 21 '23 at 12:24
  • 1
    Examples: absolute accuracy is a measure related to $Y$ as a function of $\hat{Y}$ (e.g., a smooth calibration curve) and relative accuracy is something like $R^2$ or AUROC (concordance probability). – Frank Harrell Sep 21 '23 at 18:06
  • @FrankHarrell I'm not following. If we take $R^2$ as a function of Brier score, then it is concerned with both calibration and discrimination. AUROC/c-index is all about discrimination. – Dave Sep 21 '23 at 20:50
  • 1
    Yes if you use the $R^2$ formula that allows $R^2$ to be negative. If you square the Pearson correlation coefficient then it's a relative measure. – Frank Harrell Sep 22 '23 at 01:39
9

I'd like to point out that your RMSE, MSE and MAE are not small at all.

The calibration plot shows only 3 cases (or 10 % of the data) with actual abnormality > 0.2. That is, 90 % of your data points lie within a range of actual abnormality that is just one RMSE wide.
Considering the predictions, 2 RMSE cover almost the complete prediction range.


I'm analytical chemist, an two common heuristics we use may be roughly translated to your application as:

  • qualitative detection is considered barely possible where relative RMSE* is < 1/3 (on first approximation one may argue that this would be fulfilled for the 0.8 actual abnormality point, but prediction is even worse for the high actual abnormality cases, so I'd say, not fulfilled anywhere in your data)

  • Relative RMSE below 10 % is frequently used as minimum requirement for quantitation.

* relative RMSE = $\frac{RMSE}{y}$ or $\frac{RMSE (y)}{y}$; we often look at RMSE as function of the true value.

  • 6
    +1 This is consistent with what $R^2<0$ is showing, that what appear to be small numbers for (R)MSE and MAE aren't all that small for this particular data set. – Dave Sep 21 '23 at 13:11