I'm using 5-fold cv for parameter optimisation in a regression problem. I have very few samples: around 50.
Should I use leave-p-out cv instead? (with, say, p=5)
What are the (theoretical, ignoring performance) reasons to choose one over the other?
I'm using 5-fold cv for parameter optimisation in a regression problem. I have very few samples: around 50.
Should I use leave-p-out cv instead? (with, say, p=5)
What are the (theoretical, ignoring performance) reasons to choose one over the other?
Note:
leave-5-out CV = 10-fold CV for $n = 50$
leave-10-out CV = 5-fold CV for $n = 50$
For the choice of $k$/$p$ please see Choice of K in K-fold cross-validation
I suggest that you read up about iterated/repeated cross validation which fills the gap between testing each sample once and testing all possible splits exhaustively.
Testing each sample more than once allows to measure the stability of the predictions wrt. slight changes in the training data. But it obviously does not increase the number of independent test cases.
Thus, random error due to model instability is reduced, but the random error due to the finite (small) number of test cases is not affected.
Section: "Definition 2.1 LPO risk estimator"
It's not often quoted (you usually only see leave-1-out compared to k-fold) but is technically "better". That is, k-fold is an estimation of leave-p-out. At least this is my understanding.
– Fabio Oct 14 '15 at 13:38