Why the estimates of prediction error will typically be biased upward with Cross-Validation? Is it like with decisions tree? Using a stopping criterion will increase a little the bias but will considerably decrease the variance (or overfitting in the test set)? Is this why LOOCV minimize bias but has high variance (since using LOOCV is like not using stopping criterion with decision trees)?
Asked
Active
Viewed 301 times
1
-
What do you think about my answer? If it is helpful and clear, you may accept it by clicking on the tick mark to the left. Otherwise, you may ask for further clarification. This is how Cross Validated works. – Richard Hardy Apr 02 '22 at 11:47
1 Answers
3
The training samples in cross validation are smaller than the original sample. Therefore, the model's parameters are not estimated as precisely. Therefore, the validation errors are generally larger than they would be if they were obtained from the model trained on the original sample. This is negligible for LOOCV because the training samples are almost as large as the original sample but can be noticeable for K-fold CV, especially for smaller values of K.
Regarding the variance of LOOCV vs. K-fold CV, it is not clear that the former is generally larger than the latter; see "Bias and variance in leave-one-out vs K-fold cross validation".
Richard Hardy
- 67,272
