4

I'm writing a tutorial on traditional time series forecasting models. One key issue with ARIMA models is that they cannot model seasonal data. So, I wanted to get some seasonal data and show that the model cannot handle it.

However, it seems to model the seasonality quite easily - it peaks every 4 quarters as per the original data. What is going on?

enter image description here

Code to reproduce the plot

from statsmodels.datasets import get_rdataset
from statsmodels.tsa.arima.model import ARIMA
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

Get data

uk = get_rdataset('UKNonDurables', 'AER') uk = uk.data uk_pre1980 = uk[uk.time <= 1980]

Make ARIMA model

order = (4, 1, 4)

Without a seasonal component

seasonal_order = (0, 0, 0, 0) arima = ARIMA(uk_pre1980.value, order=order, seasonal_order=seasonal_order, trend='n') res = arima.fit()

Plot

all_elements = len(uk) - len(uk_pre1980) plt.plot(uk.value, 'b', label='Actual') plt.plot(res.forecast(steps=all_elements), 'r', label='Preds') plt.title(f'order={order}, seasonal_order={seasonal_order}') plt.legend() plt.show()

Richard Hardy
  • 67,272
codeananda
  • 143
  • 5
  • See this and my comment under the answer in this thread. – Richard Hardy Nov 14 '21 at 14:21
  • 1
    That is absolutely not the case at all. Seasonal ARIMA models are a subset of ARIMA models, not an extension. – Chris Haug Nov 14 '21 at 20:43
  • @ChrisHaug but you need to add more terms to an ARIMA model to make a SARIMA model, doesn't that make it an extension? And you could say ARIMA models are a special case of SARIMA with all seasonal params set to 0 (so then wouldn't ARIMA be a subset of SARIMA?) – codeananda Nov 16 '21 at 06:34
  • 2
    @AdamMurphy, regarding Chris Haug's comment, see the post I linked to. It shows that SARIMA is a restricted form of ARIMA. It only shows it for a special case (SAR vs. AR), but that can be generalized. So mathematically / algebraically SARIMA is indeed a subset of ARIMA. Conceptually, however, I can understand treating SARIMA as an extension of ARIMA, as we "add" seasonal terms to a "baseline" ARIMA. – Richard Hardy Nov 16 '21 at 08:32

1 Answers1

3

TL;DR: Non-seasonal ARIMA models with sufficiently high order can indeed pick up seasonal signals quite easily, especially for short seasonal periods. The main risk lies in overfitting.


Let's compare two very simple models. Your data is trended, so you difference once; we will use no integration. We will also dispense with the MA component in the interest of simplicity.

  • The simplest seasonal ARIMA model for quarterly data is an $\text{AR}(0)(1)_4$, which we can write using the backshift operator $B$ as $$ (1-\Phi_1B^4)y_t = \epsilon_t $$ or $$ y_t = \Phi_1 y_{t-4}+\epsilon_t. $$
  • Let's compare this to an $\text{AR}(4)$ model, where of course I am picking the order 4 so it has a chance of picking up the seasonal dynamics: $$ (1-\phi_1B-\dots-\phi_4B^4)y_t = \epsilon_t $$ or $$ y_t = \phi_1y_{t-1}+\dots+\phi_4y_{t-4}+\epsilon_t. $$

Now, comparing our two models, we see that the $\text{AR}(4)$ model encompasses the $\text{AR}(0)(1)_4$ one: they both have an $y_{t-4}$ term, but the $\text{AR}(4)$ model also contains $y_{t-1}$, $y_{t-2}$ and $y_{t-3}$ ones.

Thus, we would expect the $\text{AR}(4)$ model to do at least as good a job in fitting as the $\text{AR}(0)(1)_4$ model. The difference may show up in the forecasts: since the $\text{AR}(4)$ model estimates three more parameters, it will be more prone to overfitting, especially, of course, if there are indeed only seasonal dynamics at work and no non-seasonal ones (assuming our data is truly generated by any ARIMA process, IMO a heroic assumption).

Also, any overfitting will show up more prominently if the seasonal (or other) signal is weaker. In your case, the seasonality is rather blatant, so even quite an overparameterized model does not overfit too badly. Consider adding some noise to your data and running the analysis a couple of times with different random noises of equal strength in each case; the nonseasonal ARIMA model should give you much more variable forecasts than a seasonal one.

Note that this very much depends on your seasonal cycle length. For quarterly data, an $\text{AR}(4)$ model estimates only three more parameters than an $\text{AR}(0)(1)_4$ one. For monthly data, in contrast, we would need to go to an $\text{AR}(12)$ model to be able to capture the seasonality - and this would need to estimate eleven more parameters than an $\text{AR}(0)(1)_{12}$ model, so the likelihood of overfitting will be much higher.

Incidentally, pmdarima.auto_arima() believes your data is $\text{ARIMA}(5,1,0)$ if we do not supply seasonality information.

Stephan Kolassa
  • 123,354
  • This seems a lot like beating a straw man. A reasonable person would not fit a high-order AR model in an unregularized way. A more relevant example would be to show that an AR(5) model with appropriate zero restrictions (lags 1, 2 and 3 set to zero) has only one more parameter to estimate than a SARIMA(1,0,0)(1,0,0)$_4$ model. I discuss a similar case here. – Richard Hardy Nov 14 '21 at 14:16
  • The estimated coefficient for lag 4 is close to 1, ar.L4=0.9613, but still smaller than one. So the seasonal cycle will shrink over the forecast horizon. This still gives different long run behavior than a model with seasonal differencing. – Josef Nov 14 '21 at 19:54
  • 1
    @RichardHardy: yes, you do have a point there. Then again, the OP is writing a tutorial. The intended audience may indeed profit from a little strawman beating. And yes, I agree that regularization like constraining parameters to be zero would be a worthwhile follow-on topic. Then again, if we can trust our students to understand this, we should really first be looking at seasonal models. – Stephan Kolassa Nov 16 '21 at 06:45
  • @StephanKolassa, I should probably have refrained from saying a lot [like], but you got the point. Your response makes sense. Thank you! – Richard Hardy Nov 16 '21 at 08:30