0

I performed a random forest model after transforming the predictor variable using the Box-Cox transformation, and found an R2 of 0.65. But the scatterplot between the actual and predicted values showed a systematic overestimation of low values, and an underestimation of high values (figure 1 below). enter image description here

In addition, I used the plotmo packages for residual analysis, which showed that the residuals are not random (figure 2 below). Any advice on how to solve this issue, please? enter image description here

  • 1
    which issue? RF is a non parametric procedure and does care about normal/symmetrical responses, or uncorrelated residuals. – utobi Jul 13 '23 at 11:54
  • 1
    Welcome to Cross Validated! This seems to be a near-perfect duplicate of your question (even the graphs look similar). If that does not address your concerns, please explain why. Perhaps there is something about your inquiry that is not addressed there and that warrants a new answer to be posted here, in which case, reopening this question to allow for such an answer to be posted would absolutely be in order. – Dave Jul 13 '23 at 11:58
  • Thanks for your comment. The overestimation of low values and underestimation of high values shown in the first graph is not a problem? (I'm a beginner in Maching Learning) – sahbeni sabrine Jul 13 '23 at 12:01

0 Answers0