How to calculate Prediction Intervals for time series forecasting with CI

Question

I'm working on a project on time series multi-step ahead forecasting in Python.

I have a time series, and I apply an ARMA model on it (statsmodels SARIMAX library). I know that ARMA models, as many other models, when forecasting tomorrow value give as output the estimate of the conditional expected value of the process for tomorrow, i.e. an estimate of the mean of the underlying process for tomorrow based on past values.

I also know that tomorrow value derives from past values and the shock (error) of tomorrow, which comes from a Gaussian distribution with mean = 0, as all other errors (errors are i.i.d.):

$\epsilon_t \sim \mathcal{N} (0, \sigma^2)$

When fitting the ARMA model on the training set, I'm estimating the ARMA parameters of the true model via maximization of the likelihood. And along the estimated parameters I obtain their confidence interval.

Since my parameters have a confidence interval, I expect the forecasted mean for tomorrow to have its own confidence interval: I'm estimating the expected value of the process with uncertain parameters, so I don't know if the estimated mean is the exact mean I'll see tomorrow, indeed I can't be sure of this, hence a confidence interval.

I don't know the formula for calculating this confidence interval. But let's move on.

Now I want to calculate the prediction interval, which is not the same thing as the confidence interval for the mean: the prediction interval combines the CI with the error variance, although I don't know the formula. I expected statsmodels ARIMA function to give me the prediction interval, but the interval given in the summary seems to be the confidence interval for the mean.

However, as this github issue reports:

In SARIMAX, we have not implemented a procedure to incorporate the uncertainty associated with estimating the parameters of the model. [...] Ultimately, the intervals produced by either SARIMAX (python) or Arima (R) don't fit either of the definitions above. In some sense they are more like the "Prediction interval" term, because they do take into account the uncertainty arising from the error term (unlike the "Confidence interval" as described above). But it is not an exact match because they don't take into account parameter estimation uncertainty.

So not only the statsmodels interval is incomplete, but it's also misleading (since it seems to be the CI for the mean).

At this point then, I would like to calculate the true prediction intervals by myself.

Looking online and in some books (like https://otexts.com/fpp3/prediction-intervals.html) I see that the prediction interval is calculated with the estimated standard deviation (standard error) of the forecast distribution.

Every step of the forecast (in a multi-step ahead forecast setting) has its own estimated standard deviation. Fine. The book cited above says that the estimated standard deviation for tomorrow is calculated as the RMSE of the past residuals adjusted by a coefficient. But as we said, shouldn't this formula take into account the confidence interval for the mean? Moreover, since the book is only taking errors into account, why calculating the RMSE of the past residuals if the errors are i.i.d. and their variance is known (by the gaussian assumption)?

$Var(e_{t+1})=Var(\mathsf{X}_{t+1}-\mathsf{\hat{X}}_{t+1})=Var(\epsilon_{t+1})=\sigma^2$

Why doesn't the book use the variance of the error distribution?

The book also says:

For multi-step forecasts, a more complicated method of calculation is required. These calculations assume that the residuals are uncorrelated.

And a little after it explains how to create prediction intervals with bootstrapped past residuals. So is there no closed formula for the prediction intervals for multi-step ahead? And also, why still the CI of the mean is not taken into account in the bootstrapping method?

Thank you for the pointer. What if the large sample approximation is not applicable? — SuperFluo, May 23 '23 at 20:25
In that case you likely don't have sufficient data for ARMA modeling, either. — whuber, Jun 09 '23 at 17:58
SuperFluo, I did not see your comment until now. To make sure I get notified of your further comments in the future, add @RichardHardy. — Richard Hardy, Jun 09 '23 at 18:14
@RichardHardy I've made some research. So this all come from the Central Limit Theorem, which states that the sample mean can be approximated to a Standard Normal Random Variable, hence its 95% confidence interval is given by the parameter estimate $\pm$ 1.96 times the estimated standard error, which given that the variance is the error variance, is the standard error of the residuals, right? So statsmodels calculates the prediction interval as the confidence interval of the normal-approximated sample mean. — SuperFluo, Jun 10 '23 at 16:02
Confidence interval for the expected value is usually much narrower than the prediction interval for a new realization of the underlying random variable. The main contribution ton the prediction interval is the random error that is added to the expected value (which we are usually able to measure quite precisely). — Richard Hardy, Jun 10 '23 at 18:35
I was wrong, one of the assumptions of the central limit theorem is that data are IID, but the ARMA model assumes they are not. @RichardHardy you're saying that since the main contribution to the prediction interval is the random error, the confidence interval for the predicted mean can be ignored? — SuperFluo, Jun 16 '23 at 10:33
@SuperFluo, in derivations of asymptotic distributions of parameter estimates, the CLT usually applies to the additive error term rather than the dependent variable. So it works in case of ARMA as long as the error term is i.i.d. Regarding the confidence interval, it depends on how large your estimation sample is. In large samples you could ignore the confidence interval, as it would not widen the prediction interval by much. — Richard Hardy, Jun 16 '23 at 10:41

How to calculate Prediction Intervals for time series forecasting with CI

0 Answers0