Time series forecasting with many predictors

Question

Suppose I want to forecast a time series $y$ using its own lags and a large set of potential candidate predictors $X$. The model could be specified as follows:

$$y_t = a + \rho \cdot y_{t-1} + \beta \cdot X_{t-1} + e_t$$

The objective is to find methods with good out-of-sample forecast performance.

So far I have used bayesian and non-bayesian shrinkage methods (e.g. Bayesian Model Averaging, Ridge, LASSO, Least Angle Regression) as well as principal component analysis.

Which alternative methods could be promising in this context?

Preferably I would like to implement them in R if packages are available.

score 4 · Accepted Answer · edited Apr 13 '17 at 12:44

4

You could use any Machine Learning (ML) method suitable for regression (i.e., dealing with numerical outputs) rather than classification. For this, simply create a predictor vector containing the lagged variables $y_{t-1}$ and treat this just as you would any other predictor.

You would need to calculate the forecasts step-by-step so you can always feed the forecast $\hat{y}_t$ as a predictor into the model to predict $\hat{y}_{t+1}$.

You would need to decide on suitable lag orders by, say, using a holdout sample, since ML algorithms won't give you ACF/PACF. (Or you could try to get an idea of good lags by running ACF/PACF plots on the time series itself, disregarding the predictors you have.)

As for specific algorithms, I personally am a big fan of Random Forests, which are implemented in the randomForest package for R. They can deal with nonlinearities, interactions and large numbers of predictors, and they are fast. However, note that most implementations of RFs can inherently not predict outside the historical range. See here for a discussion - the comments point to a couple of implementations using leaf regression, which can extrapolate, and will dampen your trend, which will usually be a good thing.

Other ML algorithms like k-nearest neighbors are similarly inherently unable to extrapolate. If you have a strongly trended series, keep this in mind. You may want to look at differencing your data first.

edited Apr 13 '17 at 12:44

Community

1

answered Feb 04 '16 at 10:26

Stephan Kolassa

123,354

Many thanks for your useful answer. I believe an alternative way of determining the lag order of the autoregressive component is to run an AR(p)-model and select optimal lag length using AIC or BIC. Random Forests and k-nearest neighbors sound interesting, but I will need to check how suitable they are for out-of-sample prediction given the limitations you are mentioning. – kanimbla Feb 04 '16 at 10:39
1

Yes, information criteria are also possible for lag selection. (Or, you could simply feed many lags to the Random Forest and let it figure out by itself which ones are relevant, see the importance entry that randomForest() returns. This may or may not work well.) – Stephan Kolassa Feb 04 '16 at 10:53
My experience with variable selection methods is that forecast performance often worsens considerably when no restrictions are imposed a priori on the autoregressive component. However, I guess it may ultimately depend on the dataset as you are indicating. – kanimbla Feb 04 '16 at 11:06
1

One problem may be that lags are correlated - after all, this is why we include them in the model (and why, say, ARIMA models with large lags tend to do badly). But Random Forests are specifically good at working with correlated predictors, because of the randomized predictor preselection they employ. So it may well be worth a try. – Stephan Kolassa Feb 04 '16 at 11:13
1

I found increased performance beyond a given optimal lag number by regularizing or aggregating distant in past lags more than near to present(t) lags. So what exactly happened 32 days ago is not so useful, but to summarize what generally happened previous month could be useful knowledge. RF would figure this out by itself if the system is not noisy. If explained variance is lower than ~10-40% it starts to get difficult for RF to estimate how to specificly utilize each lag. – Soren Havelund Welling Feb 04 '16 at 14:07
1

So take e.g. average of some lags and and add to regression model as feature. – Soren Havelund Welling Feb 04 '16 at 14:12
@ Soren Havelund Welling: Interesting approach to aggregate more distant lags, which I have never seen before. Note: I am working with monthly seasonally adjusted data, so I wonder if your approach would still be useful with lower frequency data. – kanimbla Feb 04 '16 at 14:28
1

I have used for my hobby stockbot simulating(only) daily trading. Often the yearly back tested explained variance across a population of 100 stocks is in average 5%. Too much aggregating is on the other hand non-testable/trainable. If you have data from 5 years and you compute a 6 month average you would have only 5*2=12 independant features. Of course every day would have slightly different 6 month summery, but these are not independent. This violates i.i.d. assumptions of ML and your internal-CV will be over optimistic. – Soren Havelund Welling Feb 04 '16 at 14:39
1

So don't aggregate more than that you at least have e.g. 50 or 100 independent periods. – Soren Havelund Welling Feb 04 '16 at 14:40

Tom Reilly · Answer 2 · 2016-02-04T17:41:25.517

1

Transfer Function Modeling proposed in Chapter 10 of the Box-Jenkins textbook(Time Series Analysis: Forecasting and Control 4th Edition) discuss using the Cross-Correlation Function to identify which variables are important. They key is to also consider outliers as they will destroy the relationships. If you ignore the outliers then you might fail.

There is a new package(not free) that uses this methodology. http://www.autobox.com/cms/index.php/news/219-autobox-for-r

edited Feb 04 '16 at 17:41

answered Feb 04 '16 at 17:27

Tom Reilly

1,847

Thanks for bringing up another useful approach, I will prefer implementation with open source packages. – kanimbla Feb 04 '16 at 20:05
Can someone run this problem with 1 simple causal and see if the ML or whatever can handle it? Post the model. No need to forecast. This will be a good case to prove if ML works or not. I will post the true model once I have seen a model posted. https://www.dropbox.com/s/coap5k5fcugthfc/challenge.xlsx?dl=0 – Tom Reilly Feb 06 '16 at 02:59

Time series forecasting with many predictors

2 Answers2