0

I am trying to hand calculate the prediction output for statsmodel's SARIMAX but am not getting the right values. I fit the model as follows:

from statsmodels.tsa.statespace.sarimax import SARIMAX
import pandas as pd
import numpy as np

x = np.array([1.43839683, 1.58737972, 2.56918062, 2.20768073, 2.06686168, 1.79483696, 2.10348052, 2.24404145, 1.38084798, 1.8165772 , 2.23706359, 2.06938327, 1.45480011, 1.59935103, 2.56467497, 2.19698115, 2.05029322, 1.77362942, 2.08021718, 2.21928505, 1.3667532 , 1.80440352, 2.23990379, 2.08945659, 1.49048437, 1.65989372, 2.63922424, 2.29742929, 2.17817642, 1.9274815 , 2.25526355, 2.40683635, 1.57503051, 2.01350068, 2.45757963, 2.31875038, 1.72791459, 1.91297156, 2.88799343, 2.5434238 , 2.41232631, 2.14680717, 2.45679305, 2.58724137, 1.7465048 , 2.15493785, 2.56915222, 2.39105932, 1.75722185, 1.92676286, 2.89753149, 2.56273977, 2.43401002, 2.17312866, 2.48684727, 2.62114427, 1.79892782, 2.21660381, 2.65270203, 2.50424027])

model = SARIMAX( x, order=(1, 0, 3), seasonal_order=(1, 0, 1, 12), ).fit()

df = pd.DataFrame({ "x": x, "pred": model.predict(), "resid": model.resid }) df.head()

x pred resid
0 1.4384 0 1.4384
1 1.58738 1.43838 0.149002
2 2.56918 1.61701 0.952175
3 2.20768 2.76155 -0.553871
4 2.06686 2.20161 -0.134748

To calculate the forecasts by hand, I first calculate the terms separately as follows:

df["hand_calc"] = 0

Add AR terms

for i, v in enumerate(model.arparams): df[f"ar_{i+1}_term"] = v * df.x.shift(i+1) df["hand_calc"] += df[f"ar_{i+1}_term"]

Add Seasonal AR terms

for i, v in enumerate(model.seasonalarparams): df[f"ar_S{i+1}_term"] = v * df.x.shift(12 * (i+1)) df["hand_calc"] += df[f"ar_S{i+1}_term"]

Add MA terms

for i, v in enumerate(model.maparams): df[f"ma_{i+1}_term"] = v * df.resid.shift(i+1) df["hand_calc"] += df[f"ma_{i+1}_term"]

Add Seasonal MA terms

for i, v in enumerate(model.seasonalmaparams): df[f"ma_S{i+1}_term"] = v * df.resid.shift(12*(i+1)) df["hand_calc"] += df[f"ma_S{i+1}_term"]

df["pred_diff"] = df.pred - df.hand_calc

df.tail()

x pred resid hand_calc ar_1_term ar_S1_term ma_1_term ma_2_term ma_3_term ma_S1_term pred_diff
55 2.62114 2.59697 0.0241698 5.05791 2.47886 2.58496 0.00450707 -0.000666605 -0.00223555 -0.00751738 -2.46093
56 1.79893 1.77915 0.0197747 4.36447 2.61272 1.74497 0.0091306 0.00507481 -0.0015117 -0.00591724 -2.58531
57 2.2166 2.21186 0.00473958 3.95877 1.79315 2.15304 0.00747028 0.0102807 0.0115084 -0.0166786 -1.74691
58 2.6527 2.63771 0.01499 4.79474 2.20948 2.56689 0.00179047 0.00841128 0.0233142 -0.0151472 -2.15703
59 2.50424 2.47054 0.0336958 5.03754 2.64418 2.38895 0.00566277 0.00201601 0.0190747 -0.0223449 -2.567

I have tried the same calculations without the seasonal terms but still not able to replicate the predictions.

In addition, I am not sure how to make out of sample predictions given that we do not have MA terms available.

I looked through the following solutions but was still not successful:

ARIMA SARIMA model mathematical formula

and

Unable to recreate Statsmodels ARIMAX (1, 1, 0) forecasts by hand

hamiq
  • 101
  • Some points I found:
    1. only AR models (SARIMAX(1,0,0)(0,0,0,12)) give me perfect replication.
    2. If i take SARIMAX(1,0,1)(0,0,0,12), I get errors but they reduce over time
    – hamiq Nov 16 '20 at 23:30
  • 1
    This not how a SARIMA works when you have both components. You need to multiply the two polynomials for the observation AR and then seasonal AR to get a pure observation time AR. Same for the MA when you have both components. The reason why your previous calculations worked is that you only have 1 component at a time. For example, your AR part in pure observation time is (1-.9.967L-.997940L^{12} + 0.99609189L^(13))Y(t) – Kevin S Nov 17 '20 at 00:10
  • Hi Kevin, thanks for pointing out the mistake. I tried with a simple SARIMAX(1,0,1)(0,0,0) and still am not getting the calculations right. I added the seasonal terms hoping that someone would show the full set of calculations – hamiq Nov 17 '20 at 09:09
  • Basically the issue is that the moving-average term in SARIMAX models must be estimated, and so it is not identical to the resid attribute near the beginning of the sample. See my answer to https://stats.stackexchange.com/questions/430186/how-statsmodels-predict-ma-process/430274#430274. I don't know of a simpler method for computing the exact results than actually applying the Kalman filter recursions. – cfulton Nov 17 '20 at 12:20
  • @cfulton what would be the steps required to compute the exact solution? Also, are MA terms estimated for k-step ahead forecasts (into the future) or do we assume them to be zero? – hamiq Nov 17 '20 at 13:41
  • The required steps for computing the exact solution is applying the Kalman filter recursions. The expected value of the future MA terms is always equal to zero. – cfulton Nov 17 '20 at 14:17

0 Answers0