1

So variations of this question have been asked a few times, but I think my case is somewhat different than the previous questions as they either have larger sample sizes and/or cross-sectional data.

I have a time series model of the form $$x(t+1) = f_i(x(t)|\theta_i) + \xi_t$$

for several different models $f_i (\cdot|\theta_i)$, and I estimate $\theta_i$ by maximum likelihood using a conditional decomposition. Besides using AIC for model selection, I also want to do some cross-validation. I read elsewhere that due to the temporal structure of time series models, normal cross-validation techniques such as $K$-fold cross-validation are inappropriate.

For this reason I want to use the cross-validation technique described here that starts with an initial training set, it makes a prediction on the next observation, and then expands the training set by one, and so on. (see picture below)

enter image description here

The problem I have is that my dataset is really small, around 30-40 observations, and I have to estimate 5 parameters of interest. My MLE estimates are already not very robust (according to bootstrap standard errors I obtained) for my sample size, and so splitting my already-small sample size will create very unstable estimates.

So my question is what training/testing split should I initially start off with? Before expanding it according to each step. I was thinking of a 2/3 and 1/3 split for the training and testing datasets, but that only leaves around 10 observations to be predicted in the cross-validation. Is this to small?

Welcoming of any suggestions and/or references to check out.

  • Thanks @RichardHardy actually I read your earlier question before and found it insightful. But I am a bit confused why you would opt to choose between AIC and cross-validation, rather than reporting both for completion? I fully understand your question about smaller training/testing samples tending to select more parsimonious models and appreciate you highlighting this as I can use it to justify a larger training dataset split. –  Feb 09 '21 at 06:45
  • Thanks for your insight. To answer briefly, I thought (and still think) that if cross validation delivers inferior results, reporting them could be detrimental. Also, I was interested in making a decision and what to base it on. Had it been a case of descriptive analysis instead, I might have considered reporting both. – Richard Hardy Feb 09 '21 at 07:13

0 Answers0