0

I have noticed that SARIMAX model in statsmodels does not produce the expected (correct) fittedvalues when the model is specified as an ARMA. Below is an example showing the discrepancy between what I expected and the value fitted by the SARIMAX model.

Code:

import pandas as pd
import statsmodels.api as sm

def sarimax_model():

index = pd.period_range(start='2000', periods=4, freq='A') original_observations = pd.Series([1.2, 1.5, 1.0, 0.8], index=index) mod = sm.tsa.SARIMAX(original_observations, order=(1, 0, 1)) res = mod.fit()

print("Input data:\n", original_observations) print("Model parameters:\n", res.params, "\n") print("Model residuals:\n", res.resid, "\n") print("Fitted values:\n", res.fittedvalues, "\n")

Expected value for 2001

val_2001 = 0.948959 * 1.200000 + (-0.044637) * 1.200000

val_2001 = res.arparamsres.data.endog[0] + res.maparamsres.resid[0]

print("Expected fitted values for 2001:", "\n", val_2001, "\n")

if name == 'main': sarimax_model()

Output:

Model parameters:
 ar.L1     0.948959
ma.L1    -0.044637
sigma2    0.121073
dtype: float64

Model residuals: 2000 1.200000 2001 0.367058 2002 -0.407083 2003 -0.167130 Freq: A-DEC, dtype: float64

Fitted values: 2000 0.000000 2001 1.132942 2002 1.407083 2003 0.967130 Freq: A-DEC, dtype: float64

Expected fitted values for 2001: [1.08518626]

I wonder if I am missing something here, or the SARIMAX model is simply incorrect. The SARIMAX model produces the correct answer when it is constructed as an AR.

Glad to join this community.

Solo :)

Solo
  • 1
  • Thanks for the link @cfulton.

    At the beginning of the sample the error term is unknown (or the estimate of the error is inaccurate) and the optimal parameters of the model are also unknown. Hence, the predictions at the beginning of the sample are affected by inaccurate estimates of the errors and non-optimal parameters. Why aren't the fitted values recalculated with optimal parameter values? I understand this cannot be done (or meaningless) for forecasting, but for a training sample (historical data) this seems reasonable if the aim is to find the best fitted values. Solo :)

    – Solo Jul 28 '21 at 07:25
  • For a results object constructed using fit, all output is computed using the optimal parameters. For time series models in Statsmodels, fittedvalues is defined to be the one-step-ahead predictions, and resid is defined to be the one-step-ahead prediction error. The issue here is that resid simply does not correspond to the best estimate of the MA error term at the beginning of the sample. This is just by definition and is the nature of MA processes, and there is nothing that can be done about it. It is not because things aren't computed using optimal parameters. – cfulton Jul 28 '21 at 23:39
  • Thank you @cfulton! Solo :) – Solo Jul 29 '21 at 00:11

0 Answers0