Does KFold cross-validation work for time series data?

Question

Regarding the question of whether Kfold cross-validation is applicable to time series data, I've read some answers from stackexchange and blog posts from other sources as well, and they state that we cannot apply Kfold cross-validation to time series modeling (eg. Don’t Use K-fold Validation for Time Series Forecasting, Using k-fold cross-validation for time-series model selection).

But in the Tabular Playground Series - Jul 2021 competition on Kaggle, I found some senior participants apply this approach (eg: stacked model, TPS-Jul-XGBoost Regressor optimized with Hyperopt, etc.), even set KFold(n_splits=self.n_folds, shuffle=True).

So I'm a little confused, is their approach justified? Thanks.

References:

Time series cross validation

score 2 · Accepted Answer · answered Oct 31 '22 at 22:31

These posts refer specifically to time series forecasting - building predictive models using information about the trends and cycles in the historical data. Applying Kfold cross-validation risks (a) losing the temporal relationships due to gaps in the time series and (b) building models where future information is used in the predictions.

Neither of the two Kaggle notebooks you reference are doing time series forecasting, even though the data could be considered a (multivariate) time series. They treat each time step as an independent instance and ignore any time-dependency between the instances - in other words as regular tabular data, so there's no problem using Kfold cross-validation in this case.

Thank you for answering my question. From your point of view: If I want to build a model to predict the Nasdaq index or stock prices, could I use k-fold cross-validation? — ah bon, Nov 01 '22 at 01:12
@ahbon - I know very little about forecasting models, so I can't really help you with that question. — Lynn, Nov 01 '22 at 11:47

Does KFold cross-validation work for time series data?

1 Answers1