2

The RMSSE formula from the M5 competition is the following:

enter image description here

https://mofc.unic.ac.cy/m5-competition/

This indicates the denominator, which is the naive error, is based on the 'training' data. Below is an example calculation showing RMSSE calculation:

enter image description here

I am confused as to why the RMSSE uses the purple circled regions for the calculation of the naive error and not the green circled regions. The naive prediction is being compared against a 1-step ahead scenario, while the actual forecast y hat is being compared across 1, 2, 3, etc. step ahead situations. That seems advantageous for the naive. Because of the advantages provided to naive prediction, if I get a value > 1 for RMSSE, it doesn't necessarily lead to the conclusion to switch to a naive prediction.

Why is the naive error calculated using the purple circled regions and not the green circled regions?

Stephan Kolassa
  • 123,354
narnia649
  • 127

1 Answers1

3
  1. While the M5 did indeed look at multi-step forecasts, it is not a priori obvious that a specific forecasting horizon is "more important" than another one. Different planning processes in retail (where the M5 data come from) require different forecast horizons (Fildes et al., 2022). Thus, there is no "obviously intuitively best" horizon to use in the denominator, either.

  2. The RMSSE is modeled on the Mean Absolute Scaled Error (), which has gained popularity in the forecasting community in recent decades. The MASE is calculated by dividing a focal method's MAE by the MAE achieved by the random walk in-sample, so you immediately see the parallels with the RMSSE, which uses the MSE for both. See Hyndman & Koehler (2006) for the MASE.

    At this point, one may ask why the M5 did not use the MASE. This is because the MASE elicits median forecasts rather than mean or expectation forecasts (Kolassa, 2020), and the bottom series of the M5 datasets are typically highly intermittent, so a MASE-optimal forecast would frequently have been a flat zero forecast (Kolassa, 2016), which is of little use. The RMSSE elicits expectation forecasts.

  3. It's very much debatable whether any relative error >1 means that the focal forecast is better than the scaling benchmark, especially once you look at different horizons, see Interpretation of mean absolute scaled error (MASE). That is an interpretation that is rarely put forward by professional forecasters, and neither was it in the M5 competition. Aiming for some such interpretability was simply never intended, because it's very hard to get to work.

Bottom line: measuring the accuracy of point forecasts is harder and less intuitive than it looks.

Stephan Kolassa
  • 123,354