I am trying to hand calculate the prediction output for statsmodel's SARIMAX but am not getting the right values. I fit the model as follows:
from statsmodels.tsa.statespace.sarimax import SARIMAX
import pandas as pd
import numpy as np
x = np.array([1.43839683, 1.58737972, 2.56918062, 2.20768073, 2.06686168,
1.79483696, 2.10348052, 2.24404145, 1.38084798, 1.8165772 ,
2.23706359, 2.06938327, 1.45480011, 1.59935103, 2.56467497,
2.19698115, 2.05029322, 1.77362942, 2.08021718, 2.21928505,
1.3667532 , 1.80440352, 2.23990379, 2.08945659, 1.49048437,
1.65989372, 2.63922424, 2.29742929, 2.17817642, 1.9274815 ,
2.25526355, 2.40683635, 1.57503051, 2.01350068, 2.45757963,
2.31875038, 1.72791459, 1.91297156, 2.88799343, 2.5434238 ,
2.41232631, 2.14680717, 2.45679305, 2.58724137, 1.7465048 ,
2.15493785, 2.56915222, 2.39105932, 1.75722185, 1.92676286,
2.89753149, 2.56273977, 2.43401002, 2.17312866, 2.48684727,
2.62114427, 1.79892782, 2.21660381, 2.65270203, 2.50424027])
model = SARIMAX(
x,
order=(1, 0, 3),
seasonal_order=(1, 0, 1, 12),
).fit()
df = pd.DataFrame({
"x": x,
"pred": model.predict(),
"resid": model.resid
})
df.head()
x
pred
resid
0
1.4384
0
1.4384
1
1.58738
1.43838
0.149002
2
2.56918
1.61701
0.952175
3
2.20768
2.76155
-0.553871
4
2.06686
2.20161
-0.134748
To calculate the forecasts by hand, I first calculate the terms separately as follows:
df["hand_calc"] = 0
Add AR terms
for i, v in enumerate(model.arparams):
df[f"ar_{i+1}_term"] = v * df.x.shift(i+1)
df["hand_calc"] += df[f"ar_{i+1}_term"]
Add Seasonal AR terms
for i, v in enumerate(model.seasonalarparams):
df[f"ar_S{i+1}_term"] = v * df.x.shift(12 * (i+1))
df["hand_calc"] += df[f"ar_S{i+1}_term"]
Add MA terms
for i, v in enumerate(model.maparams):
df[f"ma_{i+1}_term"] = v * df.resid.shift(i+1)
df["hand_calc"] += df[f"ma_{i+1}_term"]
Add Seasonal MA terms
for i, v in enumerate(model.seasonalmaparams):
df[f"ma_S{i+1}_term"] = v * df.resid.shift(12*(i+1))
df["hand_calc"] += df[f"ma_S{i+1}_term"]
df["pred_diff"] = df.pred - df.hand_calc
df.tail()
x
pred
resid
hand_calc
ar_1_term
ar_S1_term
ma_1_term
ma_2_term
ma_3_term
ma_S1_term
pred_diff
55
2.62114
2.59697
0.0241698
5.05791
2.47886
2.58496
0.00450707
-0.000666605
-0.00223555
-0.00751738
-2.46093
56
1.79893
1.77915
0.0197747
4.36447
2.61272
1.74497
0.0091306
0.00507481
-0.0015117
-0.00591724
-2.58531
57
2.2166
2.21186
0.00473958
3.95877
1.79315
2.15304
0.00747028
0.0102807
0.0115084
-0.0166786
-1.74691
58
2.6527
2.63771
0.01499
4.79474
2.20948
2.56689
0.00179047
0.00841128
0.0233142
-0.0151472
-2.15703
59
2.50424
2.47054
0.0336958
5.03754
2.64418
2.38895
0.00566277
0.00201601
0.0190747
-0.0223449
-2.567
I have tried the same calculations without the seasonal terms but still not able to replicate the predictions.
In addition, I am not sure how to make out of sample predictions given that we do not have MA terms available.
I looked through the following solutions but was still not successful:
ARIMA SARIMA model mathematical formula
and
Unable to recreate Statsmodels ARIMAX (1, 1, 0) forecasts by hand
- only AR models (SARIMAX(1,0,0)(0,0,0,12)) give me perfect replication.
- If i take SARIMAX(1,0,1)(0,0,0,12), I get errors but they reduce over time
– hamiq Nov 16 '20 at 23:30residattribute near the beginning of the sample. See my answer to https://stats.stackexchange.com/questions/430186/how-statsmodels-predict-ma-process/430274#430274. I don't know of a simpler method for computing the exact results than actually applying the Kalman filter recursions. – cfulton Nov 17 '20 at 12:20