Ways to increase forecast accuracy

Question

Situation

Our use case:
- demand forecasting for sales and operations planning
- monthly granularity, ~5 years worth of historical data available
- goal is to forecast future time windows of 1, 3 and 12 months
- forecasts are generated twice per month
We have a time series forecasting system in place. It uses basic ARIMA and Exponential Smoothing techniques.

Problem:

The forecasts are deemed not accurate enough for the business stakeholders.
We're currently evaluating different ways to improve forecast accuracy.

(I'm not going into details around baseline accuracy, target accuracy etc here. At this point, we're looking for general approaches for improving accuracy that we could explore further)

Current Approaches

We're currently thinking along three dimensions:

Tweaking the algorithms
- advanced non-ML methods: we plan to run tests using Auto ML or frameworks like Prophet, sktime to get an indication of how well more sophisticated algorithms perform
- use machine learning in general: we are thinking about using ML (regression trees, gradient boosting) which requires rewriting our preprocessing pipeline
Enriching the data
- include external data (and use multivariate methods): There are a couple of hypotheses of macroeconomic time series that could inform demand forecasts. We'd have to acquire datasets and test those, however.
- preprocessing: we ran basic checks for outliers, missing values etc. our data seems pretty good. Maybe there are more elaborate preprocessing steps (smoothing, transforms etc.) that work well in practice
Feasibility
- this is linked to point 1.: based on our algorithm review we might "verify" (in the loosest sense) that we can only forecast up to some level of accuracy and resort to e.g. only forecasting trends

Question

Which methods can be applied to increase forecast accuracy?

Thanks!

The best practical approach for the firm (its owners/shareholders/board of directors) is to fire all those business stakeholders who believe you can predict the future with a huge degree of uncertainty better than God, because this prediction is the most sophisticated thing that can be done in the business dimension within a company and it cannot be supervised or guided by amateur idiots with MBA diplomas and polished business résumés. — Alex, Oct 13 '22 at 07:52
Thank you for not going into target accuracy. As long as you don't know how forecastable your data is, setting targets is an exercise in futility. And please don't use published benchmarks: What is the acceptable level of accuracy while doing Weekly Time Series Forecast, Is there any standard / criteria of good forecast measured by SMAPE and MASE?, Acceptable Standard for MAPE, all applicable to all other error metrics. — Stephan Kolassa, Oct 13 '22 at 08:25
just as a comment as there is a wide response already, the best way is to understand the error metric you use and the validation method you use as every method is checking different things and is valid for different scenarios. The wrong thing and very common is to change the metric and method as forcing your forecasting model to be "good". — PeCa, Oct 13 '22 at 08:51

Stephan Kolassa · Accepted Answer · 2023-01-22T15:06:34.607

I have been forecasting retail demand for 16 years now. Retail is probably not what you are interested in, but a few comments on your ideas plus a few other ideas might be helpful.

Tweaking the algorithms: to be honest, I usually find that better algorithms are always beaten by better data, and better understood data. More complex methods will often give better results. (Often, not always. In the recent M5 forecasting competition, a trivial benchmark beat 92.5% of the submissions at the lowest granularity, see Kolassa, 2022.) What is often useful is thinking about what the forecasting method should be capable of doing. If you have important causal drivers, you should use a method that can use them, so not plain vanilla Exponential Smoothing or ARIMA - but a simple regression will often be quite competitive with a highly complex DL network, at a fraction of the cost and headache.
Enriching the data: if you use external drivers, remember that you will need to forecast these themselves in production, and forecasts of macroeconomic and many other series is highly imprecise! It's easy to fall into a trap here, using actual future values in testing such predictors and thus overestimate your certainty and accuracy improvement you will get in production. And as above, the question is whether any improvement is worth the added cost and complexity of acquiring data and feeding it into the pipeline (and maintaining all this).

I usually find that cleansing the data you do have, and understanding drivers, is much more important. Are there no sales because of supply chain problems during some months? Mark these and ignore them in training (if your method will allow so). If this happened, did demand switch to substitute products? If so, mark these periods on the substitutes, because when the original product comes back online, the substitutes will presumably see a drop in demand. Do you run promotions or similar activities? Model these. Understanding your data and cleaning it is always more important than trying more complex models.
Feasibility: this is the elephant in the forecasting room. You can't forecast a flipped coin with more than 50% accuracy, and if your business stakeholders "require" more accuracy than you can achieve, they have a problem. See How to know that your machine learning problem is hopeless?.

A few other thoughts:

Making processes more forecastable: some activities make life actively harder on the forecasters (and on the rest of the supply chain, too). Promotions in retail are notoriously hard to predict, and have cannibalization impacts on other products, and on the focal product after the promotion. There is a reason why well performing retailers like Walmart and dm drogeriemarkt in Germany run Every Day Low Price strategies - it's just easier on the supply chain, and makes forecasting easier, too. Relatedly, there have been spats between Consumer Packaged Goods manufacturers and retailers, which went as far as the manufacturers stopping deliveries. This will make forecasting harder down the line for everyone involved. Similar issues arise from product proliferation; it's easier to forecast if we have five flavors of yogurt than if we have thirty, half of which are listed and delisted all the time, even if your marketing department loves new product introductions. No, I'm not saying the forecaster has the clout to change business processes. But it might be worthwhile sitting down with other people and figuring out how their activities are negatively impacting the forecasting function.
Mitigation. Imprecise forecasts can be mitigated through safety stocks. No, nobody likes these, but once we have reached the end of our tether in terms of forecast accuracy, we can buffer the impact.
Reducing the role of forecasting: relatedly, we can reduce the reliance on forecasts by pushing customization down the line as far as possible: if we paint our widgets only right before they are shipped out, we may only need to forecast "total widgets", rather than "red widgets", "yellow widgets" and "light pink-mauve widgets" separately (which will be harder).
Measure the costs of bad forecasts and accuracy improvements: per above, often there is a point of diminishing returns in forecast improvements, where you can start spending serious money to only get a small accuracy improvement. It's worthwhile to figure out how much your accuracy improvement is worth in currency terms. I give a couple of examples in Kolassa (2022), and issue 68 of Foresight is devoted to this topic (full disclosure: I'm a Deputy Editor at Foresight). Essentially, if your logistical constraints and economic batch sizes are "large", then even better forecasts may lead to the exact same business and production decisions.
Accuracy measures: there is a huge and embarrassing disconnect between forecast accuracy measures and business relevance. Many people like the Mean Absolute Percentage Error, because it looks so easy to interpret. It isn't, it can be highly misleading, it can easily be gamed, and I have never seen a business process that would profit from a forecast only because its MAPE is lower. (If your bonus depends on the MAPE and you are cynical, you can simply game it.) The MSE and scaled variants at least elicit unbiased expectation forecasts, but in a world of safety stocks, surprisingly few business processes really leverage expectation forecasts. The relationship between quantile losses and safety stocks/service measures is a little better, but quantile forecasting is often underappreciated. So I would seriously recommend you look at your error measures and figure out whether they are useful for your business processes.
Hierarchical forecasting: you may be able to leverage hierarchies, whether in the product, the location or the time dimension. There has been a lot of work on optimal reconciliation, and it typically improves forecasts across the board. However, most of the work here up to very recently has only been on expectation forecasting, and per the previous point, quantile forecasts are often much more relevant.
Talk to experts: forecasting is a science, and there are experts out there, many of whom will be happy to talk to you, some even for free. I have been involved with the IIF and its publications, notably Foresight, there is also the Institute of Business Forecasters, and depending on where you are in the world, you might want to reach out to institutions like the Centre for Marketing Analytics and Forecasting at Lancaster University Management School. They regularly offer to have their M.Sc. students do a thesis in a company, and such students might be a reasonably cheap source of new ideas. (Full disclosure again: I'm affiliated with the CMAF.)

"[T]here are [forecasting] experts out there, many of whom will be happy to talk to you, some even for free." I agree. It does seem like there are some forecasting experts who are willing to advise @movingabout for free... — gung - Reinstate Monica, Oct 13 '22 at 16:49

score 3 · Answer 2 · answered Oct 13 '22 at 16:31

First of all I agree 100% with Stephen's answer, I'll just add a little bit from my 2 years of experience!

The ML vs traditional methods IMO boils down to a simple question:

Do you have good drivers to use as variables?

Time series methods work best for time series, of course you can use other factors to aid but with 1 time series going to 1 model you also need to be careful with those features. ML (boosted trees / RFs like you suggest) work best for tabular data where you tend to lose your time series structure so you have to make up for that with good tabular features and simply 'represent' time with other features.

Things like price of products, marketing expense etc. If you don't have these types of variables for your domain then I would bet a decent stat engine outperforms a state-of-the-art ML model in a production setting. That production setting piece is important, with an ML model you have very little control of the actual forecast -you get what you get. A stat engine should allow you to on-the-fly switch to another method if the forecast of the current one is wonky, which leads to my next thought. Just remember though, if you use something like GDP you then probably have to forecast for GDP to use it in the future which is probably very problematic! Or use lagged GDP which may not be as useful.

What makes a decent stat engine?

Your model portfolio (what you are looking into now) is important but model selection and a business logic layer is everything. For model selection look to time series cross validation. For the business logic layer I would lean on the stakeholders of the forecast. For example, you probably want to assign a 'demand type' for each given time series. Like if 30% or more of the series is 0 then you want to assign it a 'type' which would only allow certain models to be selected such as simple Exp smoothing or croston or mean. An arima may product wonky results in those settings. You could also check to ensure the forecast doesn't go from 5 units to 50 million, something that is possible in an overparameterized arima. You could check to see if there a certain product lifecycles at play, like if there is a build up and fall off over the years and then fit a more local model or weigh the more recent years more if your method takes sample weights. A lot of possibilities here for adding logic that aids the engine.

In summary,

Add some naive methods, you could add some other methods but I personally would stay away from Prophet - autoarima + autoets + naive methods (mean, last period, last seasonal period) will be a good start, take a look at your model selection criteria to ensure it is robust, add some 'logic' to help ensure that the model is appropriate and isn't just merely the one that minimizes some loss function.

But most importantly -

Look at your forecasts.

Setup some quick flags to surface forecasts where the model suggests new maxes/mins or the average of the forecast period is significantly different than the average of the history. Figure out if there are commonalities between these flagged series like a ton of zeros etc. Many times it is just an odd bug in your code meaning that your outlier detection isn't working right or some other issue that causes bad results.

If you have done all of that and want additional models to try my main recommendations would be:

Theta - there are tons of implementations across python and r. Theta plus auto arima do well in general.
Croston - pretty standard for intermittent data.

A lot of 'AutoML' time series methods literally try everything under the sun, take a lot of time, and don't add too much value beyond all the methods listed above.

Additionally you could try out some of my personal projects in the field

ThymeBoost which is just gradient boosted time series decomposition with traditional methods like ETS and ARIMA
TimeMurmur my newest which does large scale LightGBM time series forecasting, probably wouldn't use it in prod but you could give it a shot as a baseline.

Ways to increase forecast accuracy

2 Answers2

Linked