3

I have seen multiple tutorials [example link] for ARIMA where they select the p,q,d parameters for it based on the whole time series. Then, after deciding on the model parameters they want to use, they split the data in training and test and make predictions for the test set to see how the model performs.

Shouldn't the p,q,d parameters be selected on the training data only, to ensure the elimination of any bias in the test set performance evaluation?

Richard Hardy
  • 67,272
MattSt
  • 350

1 Answers1

2

Yes, of course. ARIMA models are no different than any other model. The workflow is always to first split your data into a training and a testing sample (for time series data, you of course always use the last observations for the test), then fit the model to the training data, then evaluate predictions on the test set. In-sample measures of fit are almost meaningless.

And of course the model fitting step also includes determining the ARIMA orders, which would therefore be done based on the training data only. Just as in fitting an OLS model, we would determine any transformations or interactions needed based on the training data, not the entire dataset. This is standard practice by (sorry) real forecasters, see any issue of the International Journal of Forecasting.

Incidentally, the procedure outlined in that tutorial for determining the AR and MA orders is iffy. ACF/PACF plots can only be used in this way for "pure" AR(p) or MA(q) models. In any case, one nowadays uses a search over possible models based on information criteria, rather than the earlier Box-Jenkins approach. This is implemented in the forecast and fable packages for R. I recommend Forecasting: Principles and Practice (2nd ed.) by Athanasopoulos & Hyndman and Forecasting: Principles and Practice (3rd ed.) by Athanasopoulos & Hyndman.

Stephan Kolassa
  • 123,354