3

My question follow this one.

Metrics such as MASE and MSE have better properties than MAPE.

However, MAPE has an excellent appeal for business areas, because it is easy to interpret and it is compared to the real data.

Where I work, they use cMAPE to select the models and it is calculated as: $$\text{cMAPE} = \frac{|\text{y}_\text{true} - \text{y}_\text{pred}|}{\max(\text{y}_\text{true}, \text{y}_\text{pred})}$$

It was made like this in order to prevent the occurrence of 0's in y_true. However, clearly, the problem with this metric is that: higher y_pred tends to produce smaller cMAPE.

So, I was thinking of using MASE or MSE to select the models, but show the final result as cMAPE because anyone can understand easily.

Is it valid? Or will it create another problem?

  • 3
    It might, in that someone else might come along and say "I can do better than (he) did!", and prove it by taking your model, optimizing it with respect to cMAPE, and presenting the results, which, naturally, will have a lower cMAPE than yours. – jbowman Aug 25 '23 at 15:18

1 Answers1

6

You can certainly do this. (You can do a lot of things.)

Here is the thing to keep in mind: if you want to minimize the cMAPE (maybe your bonus depends on getting a cMAPE below a certain number?), you will almost certainly want to give a forecast that is different than if you want to minimize the MSE or the MAE. (The MSE-minimizing forecast is usually also different from the MAE-minimizing one.)

For instance, assume the true demand is Poisson distributed with parameter 0.5. What is the "best" forecast? If by "best" you mean that the expected MSE is minimized, you want to forecast the expected demand, 0.5. If you want to minimize the expected MAE, you want to forecast median demand, which is 0. If you want to minimize the expected cMAPE, you want to forecast 1.

errors against forecasts

Thus, you can certainly optimize your models so their point forecasts minimizes the MAE or MSE. How do you go from a model to a point forecast? Per above, you need to decide which functional of the predictive density you want to extract. And you can then report cMAPE. However, if you are tasked to reduce the cMAPE, one lever is to extract a different functional than the one you used in minimizing the MSE or MAE.

Taking a step back, I would very much recommend that you tailor your error measure to the functional you want to elicit - not the other way around, as above. How useful in a business sense is it to elicit a point forecast of 1 for a Poisson density with expectation of 0.5? I have never seen a business process that would be optimally controlled through MAPE-optimal forecasts, and I strongly suspect that the cMAPE is no better. Conversely, there are processes that are optimally controlled by unbiased expectation forecasts (which are elicited using the MSE), and others that require quantile forecasts (which you elicit using a pinball loss). For instance, if you want to set safety amounts, you need a quantile forecast for that.

You may be interested in What are the shortcomings of the Mean Absolute Percentage Error (MAPE)? and this paper.

R code for the plot above:

sims <- rpois(10000,0.5)
candidates <- seq(min(sims),max(sims),by=0.01)

mses <- sapply(candidates,function(xx)mean((sims-xx)^2)) maes <- sapply(candidates,function(xx)mean(abs(sims-xx))) cmapes <- sapply(candidates,function(xx)mean(abs(sims-xx)/pmax(sims,xx)))

opar <- par(mfrow=c(1,3),las=1) plot(candidates,mses,type="l",ylab="",main="Expected MSE") plot(candidates,maes,type="l",ylab="",main="Expected MAE") plot(candidates,cmapes,type="l",ylab="",main="Expected cMAPE") par(opar)

Stephan Kolassa
  • 123,354
  • Thank you so much for the answer! You mention MAE instead of MASE, because the inner part is the same: absolute differenceS?. The same happens to MAPE and cMAPE. What do you mean by "Per above, you need to decide which functional of the predictive density you want to extract"? You mean, the metric error? and "I would very much recommend that you tailor your error measure to the functional you want to elicit - not the other way around, as above."? I didn't understand the 'functional' term. I will read the paper tonight! – Guilherme Parreira Aug 25 '23 at 19:17
  • Yes, MASE is just a scaled MAE. For cMAPE, it depends on how you sum: $\frac{\sum|y_i-\hat{y}_i|}{\sum \text{max}{y_i,\hat{y}_i}}$ is a different thing (closer to wMAPE) than $\sum\frac{|y_i-\hat{y}_i|}{\text{max}{y_i,\hat{y}_i}}$ (closer to MAPE); which one do you use? ... – Stephan Kolassa Aug 25 '23 at 19:27
  • 1
    ... Re "functional": the paper will make it clearer. I took that term from Gneiting. It's essentially a "one-number summary of a predictive density". The key observation is that there is always a predictive density involved, although it may only be implicit. Using MSE as a training loss will elicit the expectation; using the MAE will elicit the median; both are functionals of the predictive density. Once you have a predictive density - like the Poisson example above - you can summarize it into a single number in various ways, and which one is best depends on your error measure. – Stephan Kolassa Aug 25 '23 at 19:29
  • I use the first one. Actually, the function in python is defined as mape = np.mean(np.abs(y_true - y_pred)/np.maximum(y_true, y_pred)) – Guilherme Parreira Aug 25 '23 at 19:31
  • Finally, my recommendation is to first ask whether you want a conditional expectation, or a conditional quantile (at what level?), or a conditional median, and afterwards choosing an error measure that will reward you for finding that functional. Want the conditional expectation? Use MSE. Want a quantile? Use a pinball loss. – Stephan Kolassa Aug 25 '23 at 19:31
  • Hm. But with your Python code, you use the second possibility, not the first one, since the np.mean is outside the fraction. And then this is not just a scaling of the MAE, just as the MAPE isn't. – Stephan Kolassa Aug 25 '23 at 19:33
  • Thank you for the great answer! From my business point of view, I always prefer to use conditional expectation. However, it is hard to present an MSE because they always want mape (they don't know that they are calculating it, but they are). But I see your point. If I select a model based on MSE and present it on MAPE, the math will not agree. What do you do in those cases? Do you convince the audience to use a less interpretable metric? – Guilherme Parreira Aug 25 '23 at 19:44
  • That is indeed a hard problem. On the one hand, I try to educate people on the pitfalls in forecast accuracy measurement. On the other hand, we aim for expectation forecasts (plus a variance forecast and a distributional assumption for safety amount setting). We again try to educate our users, but if we don't seem to get through to them, we do sometimes post-process our forecasts to yield better MAEs by reducing them by a few percent. – Stephan Kolassa Aug 26 '23 at 07:13
  • In general, my strategy is to demonstrate to clients that I know what I am doing, and that I indeed try to help them through better forecasting. Once we have built up enough trust, they will listen with more openness when I explain to them that things are not simple, and that being careful about MAPE is not about fooling them, but about helping them to see the issues. – Stephan Kolassa Aug 26 '23 at 07:15
  • Nice!! What is "we do sometimes post-process our forecasts"? – Guilherme Parreira Aug 26 '23 at 19:21
  • ... "by reducing them by a few percent". We sometimes do this when the client insists on evaluating using MAE/MASE/wMAPE. All of these are scalar multiples of the MAE, so they are minimized in expectation by the conditional median. Since we usually deal with slow moving products, often intermittent ones, the conditional median is usually lower than the conditional expectation, so we reduce our expectation forecasts a bit to get lower MAEs, because the post-processed forecasts are closer to the median. (I emphasize that the business value is driven by the high quantile forecasts, anyway.) – Stephan Kolassa Aug 26 '23 at 19:29
  • Ok! But how do you do that ? Do you select the model that produces the smallest MAE? Or do you first select a model based on MSE and then do sth else to give smaller MAE? – Guilherme Parreira Aug 26 '23 at 19:39
  • Actually, we don't do model selection at all. We have one model, fit that (regularizing to address the overparameterization), then output its expectation forecast. And then we possibly tweak that. I have been lobbying to also output the conditional median and explaining to customers what the difference is, but haven't quite gotten the resources to do so yet. – Stephan Kolassa Aug 26 '23 at 19:48
  • I didn't understand your last comment. And also, I found very interesting the approach "forecast and assess full predictive densities" that you showed in here. Is there anywhere in the web that has a detailed discussion on this topic? Because it was not clear to me how to implement it.. – Guilherme Parreira Sep 02 '23 at 13:20
  • If you could explain which part of my last comment was confusing, I'll gladly try to explain. About assessing full predictive densities, take a look at this and the papers by Gneiting and colleagues. The tool of choice usually is a proper scoring rule, here is our tag wiki. – Stephan Kolassa Sep 06 '23 at 14:11