10

I am working on an evaluation of time series forecasting models in Python, more specifically with statsmodels, scikit-learn and tensorflow. I think it makes sense to first compare the model performance to a set of "trivial" models.

What are examples of such baseline models typically used? Are there existing implementations? (E.g., is there something analogous to scikit-learn DummyClassifier for time series forecasts?)

clstaudt
  • 253

2 Answers2

8

I think it makes sense to first compare the model performance to a set of "trivial" models.

This is unspeakably true. This is the point where I upvoted your question.

The excellent free online book Forecasting: Principles and Practice (2nd ed.) by Athanasopoulos & Hyndman gives a number of very simple methods which are often surprisingly hard to beat:

  • The overall historical average
  • The random walk or naive forecast, i.e., the last observation
  • The seasonal random walk or seasonal naive or naive2 forecast, i.e., the observation from one seasonal cycle back
  • The random walk with a drift term, i.e., extrapolating from the last observation out with the overall average trend between the first and the last observation

These and similar methods are also used as benchmarks in academic forecasting research. If your newfangled method can't consistently beat the historical average, it's probably not all that hot.

I am not aware of any Python implementation, but that should not be overly hard.

Stephan Kolassa
  • 123,354
7

Adding to the previous answer by Stephan Kolassa: we're developing a Python toolbox for forecasting and have implemented a "naïve forecaster" class for that purpose. So with sktime, you could for example run:

import numpy as np
from sktime.datasets import load_airline
from sktime.forecasting.model_selection import temporal_train_test_split
from sktime.performance_metrics.forecasting import smape_loss
from sktime.forecasting.naive import NaiveForecaster

y = load_airline()  # time series data
y_train, y_test = temporal_train_test_split(y)  
fh = np.arange(1, len(y_test) + 1)  # forecasting horizon
forecaster = NaiveForecaster(strategy="last")  # random walk 
forecaster.fit(y_train)
y_pred = forecaster.predict(fh)
print(smape_loss(y_test, y_pred))
mloning
  • 518
  • This looks great! Are there any publicly available performance scores for sktime's datasets? It would be very useful to benchmark it against that. There aren't any in the documentation unfortunately: https://www.sktime.org/en/stable/api_reference/datasets.html – semyd Nov 24 '22 at 08:14
  • To expand on my previous comment: A source with benchmarks that is useful is paperswithcode, but they only have 12 datasets and most are 'complex' time-series with high dimensional cross-sectional data (eg.: robotics vision, speech recognition).: https://paperswithcode.com/task/time-series#datasets – semyd Nov 24 '22 at 08:22
  • We ran sktime on the M4 study, but it's a bit outdated now: https://github.com/mloning/sktime-m4 – mloning Dec 08 '22 at 18:30