2

I am creating a a number of forecasts for sales at various levels of data aggregation, based on the properties of the products (eg. is it in a bottle or a can). I plan to create multiple models and then review the outputs before selecting a forecast - I would like to use MASE as a comparison metric.

I have monthly data that is seasonal (M = 12). Initially, I will be calculating HW Additive, Multiplicative, Damped and a Naive forecast models. I have 3-5 years of data available depending on the series, and would like to forecast 12-24 months into the future.

  1. If I decide that my test data will be the final 12 months of data, when calculating the MASE should my denominator consider all naive forecasts up until the final period N? Or only until N-M?

  2. Does the numerator use all the error values up until N (ie. all training and test data...?)

  3. There is a desire to have the model reflect the most recent trends - What will the impact be to MASE if I 'overfit' my model by using all of the data up to M as my training set?

  4. What is the best method to check the fit of the model against the last 12 months.

For reference I am not using R or Python for this so I really need to understand from first principles how to write the model.... Apologies if these are silly questions, I am relatively new to forecasting - but am loving it! :)

Stephan Kolassa
  • 123,354

1 Answers1

4
  1. Per the original paper by Hyndman & Koehler (2006), you would only use the in-sample one-step-ahead naive MAEs as the MASE denominator. So if you use the first $N-M$ data points to fit your models, that is what you would use.

    However, different people use other methods to standardize the MAE, most commonly not using the in-sample errors, but the ones in the forecast period. Either way is reasonable, though the "original" way gives you a more stable denominator, since you typically have more training than testing data.

  2. In-sample fit is a notoriously poor indicator of true performance in production. This is why we use holdout data. And therefore we would only use the holdout data MAE from the last $M$ periods for the MASE numerator.

  3. As with point 2 above, if you use data both for model fitting and evaluation, you will overfit. The result is that you will be too sure of yourself. Your forecasts will be worse than expected in production, where you forecast truly new data.

    What you can do is to fit your models, evaluate them using a holdout period, select the most accurate one, and refit that one using all data for actual production forecasting. (Even better, fit all models and average forecasts. Averages are often very hard to beat.)

  4. There is a plethora of error measures. MASE is not bad at your level of aggregation. If you look at finer granularities, e.g., daily sales per location and SKU, which are typically intermittent, the MASE can lead you astray badly.

You may find our resources on forecasting helpful: Resources/books for project on forecasting models

Stephan Kolassa
  • 123,354