1

My Outcome is quarterly sales, which is right_Skewed and the range of it is large b/c different groups purchased differently. For Example, there are lots of groups that made zero or less than 10 purchases, but also some made 10,000 purchases. The negative values means some people return more than purchases.

I have tried random forests and linear regression, and both of them have a huge RMSE, which is more than 200. I have no idea about which models would apply my model and how to deal with these negative values.

Jeremy Miles
  • 17,812
  • Welcome to Cross Validated! Why do you see the negative values as unusual or odd, needing special handling? Sure, you would rather people buy more than they return, but if someone will return more than they buy, you'd like to catch that, so the negative values being inconvenient for the business does not strike me as a reason to find them in need of special handling. $//$ Why do you say that $RMSE\approx 200$ is huge? – Dave Nov 20 '23 at 18:47
  • 1
    Are you sure what to look for? Please consider cost sensitive machine learning to focus on costs if this is what you are looking for. – Ggjj11 Nov 20 '23 at 18:48

1 Answers1

1

On the one hand, leverage domain knowledge to find good predictors.

On the other hand, live with the fact that we never can predict as well as we would like to, and that even knowing when we have reached the limit of predictability can be hard.

On the third hand, get clarity about what you want to predict, and tailor your analysis to this. This can entail cost-based learning, as Ggjj11 writes. Or it can mean selecting an objective function that is commensurate with what you are trying to do.

Stephan Kolassa
  • 123,354