How does statsmodels calculate in-sample predictions in AR models?

Question

I am very new to time series modeling and statsmodels and trying to understand the AR model in statsmodels. Suppose I have a data record y of 1000 samples, and I fit an AR (1) model on y. Then I generate the in-sample prediction from this model as y_pred. I do this as

from statsmodels.tsa.ar_model import AutoReg
model = AutoReg(y,1).fit()
y_pred = model.predict()

I get the parameters of the model using model.params.

I would like to know, after estimating the model parameters, how does statsmodels calculate the in-sample predictions? For ex. how is y_pred[10] calculated?

I am sorry if the question is too basic, thanks for the help.

Same question was asked in stackoverflow and has been answered here: https://stackoverflow.com/questions/67236144/how-does-statsmodels-calculate-in-sample-predictions-in-ar-models/67258872#67258872 — Chandramouli Santhanam, Apr 26 '21 at 08:59

score 1 · Accepted Answer · answered Jun 03 '21 at 20:40

For completeness I'm posting the answer given on Stackoverflow by @AlexK which the OP says has answered their question:

Per Wikipedia:

The autoregressive model specifies that the output variable depends linearly on its own previous values and on a stochastic term (an imperfectly predictable term).

In your model example, you have one predictor - lagged value of y. In this simple case, the .predict() method multiplies each lagged value by the value of the estimated linear slope parameter for that predictor and adds the estimated value of the intercept of that line. So y_pred[10] will be equal to the product of the fitted slope parameter and y[9], with the value of the intercept estimate added.

Here is an example:

from statsmodels.tsa.ar_model import AutoReg
y = [1, 2, 3, 6, 2, 9, 1]
model = AutoReg(y,1).fit()
model.params
array([ 5.72953737, -0.49466192])

The first value in the params array is the estimated intercept parameter and the second value is the estimated linear (slope) parameter.

y_pred = model.predict()
y_pred
# array([5.23487544, 4.74021352, 4.2455516 , 2.76156584, 4.74021352, 1.27758007])

The first value in the y_pred array is the predicted value for the second value in the y array. It is calculated as:

-0.49466192 * 1 + 5.72953737 = 5.23487544

The second value in the y_pred array is computed as:

-0.49466192 * 2 + 5.72953737 = 4.74021353

and so on...

How does statsmodels calculate in-sample predictions in AR models?

1 Answers1

array([ 5.72953737, -0.49466192])

Linked