4

As a newbie, I am trying to implement the forecast using the auto Arima model. After searching, I found this site illustrates the usage and the hyperparameters used in the model. However, when I tried to forecast, the model gave me an array of constants.

Please advise if I asked in the wrong place. Thanks.

The data is a simple 29 days data:

daily_infect = [ 15,  11,  21,  25,  32, 186, 204, 334, 242, 274, 294, 315, 722,
       453, 594, 536, 640, 672, 557, 489, 358, 351, 330, 548, 582, 474,
       506, 325, 214]

Here is the code:

# reference: https://alkaline-ml.com/pmdarima/modules/generated/pmdarima.arima.auto_arima.html
import pmdarima as pm
model = pm.auto_arima(daily_infect,
                      start_p=2, start_q=2, # default=2
                      test='kpss',       # default=kpss
                      max_p=5, # default=5 
                      max_q=2, # default=2
                      m=1,              # Note that if m == 1 (i.e., is non-seasonal)
                      d=None,           # If None (by default), the value will automatically be selected based on the results of the test
                      seasonal=False,   # No Seasonality
                      start_P=1,        # default=1
                      D=None,           # The order of the seasonal differencing. If None (by default, the value will automatically be selected based on the results of the seasonal_test
                      trace=True,
                      error_action='warn',  # default=warn
                      stepwise=True)    # The stepwise algorithm can be significantly faster than fitting all (or a random subset of) hyper-parameter combinations and is less likely to over-fit the model.

print(model.summary())

I ran the diagnose. It seems okay

model.plot_diagnostics(figsize=(15,8))
plt.show()

diagonose

Here is the forecast code:

# Forecast
n_days = 10
fc = model.predict(n_periods=n_days)
index_of_fc = np.arange(len(daily_infect), len(daily_infect)+n_days)

make series for plotting purpose

fc_series = pd.Series(fc, index=index_of_fc)

Plot

fig, ax = plt.subplots(figsize=(15,9)) ax.plot(daily_infect) ax.plot(fc_series, color='red')

ax.set_title("Final Forecast") ax.figure.autofmt_xdate() plt.show()

The constant prediction

What I've tried is to change some parameters back to default, but no luck. Is there's anything I can improve? Thanks.

Woden
  • 177

1 Answers1

6

Here is what your call to pm.auto_arima() writes to the console:

Best model:  ARIMA(0,1,0)(0,0,0)[0]

That is, it fits a non-seasonal (that's the trailing (0,0,0)[0] part, and it's not surprising, since you specified seasonal=False) ARIMA(0,1,0) model. This is an ARMA(0,0) model on first differences, or

$$ By_t = y_t-y_{t-1} = \epsilon_t, $$

where $B$ is the backshift operator, and $\epsilon_t\sim N(0,\sigma^2)$. Alternatively,

$$ y_t=y_{t-1}+\epsilon_t. $$

That is, a random walk.

In forecasting, you substitute the expected value for the innovations $\epsilon_t$, which is zero. Thus, your forecasts are simply the last observation. In particular, the forecasts do not vary over time, so you get a flat line.

Now you will probably wonder why auto_arima() fits a random walk. As Tim writes, there is no obvious cycles or trends in your data, and the stepwise AIC optimization does not find meaningful autocorrelation or moving average dynamics in your time series. So auto_arima() indeed believes a random walk is the best description of your data.

You may want to look through previous questions on flat ARIMA forecasts. Or at Is it unusual for the MEAN to outperform ARIMA? A flat forecast - whether from the overall average, as discussed in the last link, or whether from a random walk model - is surprisingly often the best forecast you can make. If there is no structure to be found, then there is no structure to be found.

I recommend the excellent free online book Forecasting: Principles and Practice (2nd ed.) by Athanasopoulos & Hyndman. It uses R, not Python, but it's very good, accessible, and free.

Stephan Kolassa
  • 123,354