0

Just as the title imply, I am searching for an error evaluation metric with the following characteristics:

  • able to handle cases where actual value is 0
  • can evaluate different units/scales
  • isn't disproportionately affected by outliers

My intention is to find a complementary evalution metric to sMAPE.

OLGJ
  • 317
  • What do you mean by your third point? Are your "outliers" data errors or correct "extreme" points? Do you want conditional expectation predictions (then your "outliers" carry a lot of information!) or conditional median predictions? In the latter case, you could use the MAE. You can always scale it, or any other error metric. You may find Kolassa, 2020 interesting. – Stephan Kolassa Mar 19 '23 at 14:45
  • They are correct "extreme" points, as you say. I am testing to compare what transformation of my response variable ( y_t, y_t-y_t-1 and diff(log)) affects the errors. I have found sMAPE as one metric, and now I wish to complement it. – OLGJ Mar 19 '23 at 14:54
  • In the paper linked above, I argue that it makes no sense to evaluate a single prediction using multiple error metrics, so I would say that "complementing" one error metric with another one is usually misleading. That said, I would recommend that you think about what functional of the predictive distribution you want to elicit, and tailor your error metric to it. On the sMAPE, I recommend What are the shortcomings of the Mean Absolute Percentage Error (MAPE)? - it applies equally, mutatis mutandis, to the sMAPE. – Stephan Kolassa Mar 19 '23 at 14:57
  • I see. Do you have any recommendations of how to approach the functional of the predictive distribution? I'm not sure if I understand the sentence. – OLGJ Mar 19 '23 at 15:04
  • Do take a look at that paper of mine, feel free to ping me on ResearchGate for it. The idea is that if you want to predict the conditional median, you should use a different error metric (MAE) than if you want to predict the conditional expectation (MSE) - even if the unknown future distribution is nowhere explicitly addressed. This becomes more obvious for quantile predictions (pinball loss), but apparently, people get confused between the MAE and MSE. And percentage errors are yet more misleading, see that MAPE thread. – Stephan Kolassa Mar 19 '23 at 15:08
  • I've read your paper, it clarified a lot. You mention, "Let us be consistent: if our aim is solely to minimize the MAPE, models should be fitted using the in-sample MAPE as an optimization criterion.". My concern is that I don't know how I'll evaluate my final prediction. In the theory related to my research (the task at hand is evaluating prediction of an agnostic prediction model and evaluate associated uncertainty with a prediction) they used intervals. Do you have recommendations/papers discussing which functional of the predictive distribution makes sense to focus on then? @StephanKolassa – OLGJ Mar 23 '23 at 08:50
  • Thanks for reading that paper! I would say that what functional to use can only be decided once you know what your prediction will be used for. I do forecasting for inventory control, so what we need are quantile forecasts for safety amounts, which can be evaluated using the pinball loss. Sometimes we are also interested in the conditional expectation (e.g., for retail promotion planning), which we can evaluate using the MSE. ... – Stephan Kolassa Mar 23 '23 at 08:55
  • ... If you have no idea whatsoever what someone else will be using your prediction for, I would say your best bet is to output a full predictive density, then whoever consumes your prediction can extract whatever they need, because they (presumably and hopefully) know best what they need and want to do. You can evaluate predictive densities using proper scoring rules, take a look at our tag wiki or at this text. – Stephan Kolassa Mar 23 '23 at 08:58
  • Of course, full densities are hard to deal with unless you have a parametric density. You could deal with that by outputting multiple quantile predictions (which is what one of the GEFComs requested for precisely this reason), which in turn you could evaluate using pinball losses. Or you could try multiple predictive intervals (the recent M5 forecasting competition essentially did that) and use an interval score. I gave some pointers here. – Stephan Kolassa Mar 23 '23 at 09:01
  • The prediction will be either a point prediction or an interval. This will be compared to the prediction of an agnostic prediction model (which is a point estimate). The idea is then to evaluate how certain the prediction from the agnostic model is, and evaluate if/how that prediction can be improved. One idea is to use posterior intervals, and compare how a point estimate with 65% probability of being true could be seen as worse than a prediction interval with 95% probability. – OLGJ Mar 23 '23 at 09:26
  • But I will check out your recommendations and see if they make me any smarter :) – OLGJ Mar 23 '23 at 09:26

0 Answers0