Unexpected fittedvalues using statsmodels SARIMAX

Question

I have noticed that SARIMAX model in statsmodels does not produce the expected (correct) fittedvalues when the model is specified as an ARMA. Below is an example showing the discrepancy between what I expected and the value fitted by the SARIMAX model.

Code:

import pandas as pd
import statsmodels.api as sm
def sarimax_model():
index = pd.period_range(start='2000', periods=4, freq='A')
original_observations = pd.Series([1.2, 1.5, 1.0, 0.8], index=index)
mod = sm.tsa.SARIMAX(original_observations, order=(1, 0, 1))
res = mod.fit()
print("Input data:\n", original_observations)
print("Model parameters:\n", res.params, "\n")
print("Model residuals:\n", res.resid, "\n")
print("Fitted values:\n", res.fittedvalues, "\n")
Expected value for 2001
val_2001 = 0.948959 * 1.200000 + (-0.044637) * 1.200000
val_2001 = res.arparamsres.data.endog[0] + res.maparamsres.resid[0]
print("Expected fitted values for 2001:", "\n", val_2001, "\n")
if name == 'main':
    sarimax_model()

Output:

Model parameters:
 ar.L1     0.948959
ma.L1    -0.044637
sigma2    0.121073
dtype: float64
Model residuals:
 2000    1.200000
2001    0.367058
2002   -0.407083
2003   -0.167130
Freq: A-DEC, dtype: float64
Fitted values:
 2000    0.000000
2001    1.132942
2002    1.407083
2003    0.967130
Freq: A-DEC, dtype: float64
Expected fitted values for 2001: 
 [1.08518626]

I wonder if I am missing something here, or the SARIMAX model is simply incorrect. The SARIMAX model produces the correct answer when it is constructed as an AR.

Glad to join this community.

Solo :)

Thanks for the link @cfulton.
At the beginning of the sample the error term is unknown (or the estimate of the error is inaccurate) and the optimal parameters of the model are also unknown. Hence, the predictions at the beginning of the sample are affected by inaccurate estimates of the errors and non-optimal parameters. Why aren't the fitted values recalculated with optimal parameter values? I understand this cannot be done (or meaningless) for forecasting, but for a training sample (historical data) this seems reasonable if the aim is to find the best fitted values. Solo :) — Solo, Jul 28 '21 at 07:25
For a results object constructed using fit, all output is computed using the optimal parameters. For time series models in Statsmodels, fittedvalues is defined to be the one-step-ahead predictions, and resid is defined to be the one-step-ahead prediction error. The issue here is that resid simply does not correspond to the best estimate of the MA error term at the beginning of the sample. This is just by definition and is the nature of MA processes, and there is nothing that can be done about it. It is not because things aren't computed using optimal parameters. — cfulton, Jul 28 '21 at 23:39

Unexpected fittedvalues using statsmodels SARIMAX

Expected value for 2001

val_2001 = 0.948959 * 1.200000 + (-0.044637) * 1.200000

0 Answers0