Why we should do cross-validation instead of using separate validation set?
Aurélien Géron talks about this in his book
To avoid “wasting” too much training data in validation sets, a common
technique isto use cross-validation.
Instead of other k values, why we may prefer to use k=10 in cross-validation?
To answer this, at first, I would like to thank Jason Brownlee, PhD for his great tutorial on k-fold Cross-Validation. I am citing one of his cited book.
Kuhn & Johnson talked about the choice of k value in their book .
The choice of k is usually 5 or 10, but there is no formal rule. As k
gets larger, the difference in size between the training set and the
resampling subsets gets smaller. As this difference decreases, the
bias of the technique becomes smaller (i.e., the bias is smaller for
k=10 than k= 5). In this context, the bias is the difference between
the estimated and true values of performance
Then, one may say that why we do not use leave-one-out cross-validation (LOOCV) as k value is maximum there and thus, bias will be least there. In that book, they have also talked why we can prefer 10 fold CV instead of preferring LOOCV.
From a practical viewpoint, larger values of k are more
computationally burdensome. In the extreme, LOOCV is most
computationally taxing because it requires as many model fits as data
points and each model fit uses a subset that is nearly the same size
of the training set. Molinaro (2005) found that leave-one-out and
k=10-fold cross-validation yielded similar results, indicating that
k= 10 is more attractive from the perspective of computational
efficiency. Also, small values of k, say 2 or 3, have high bias
but are very computationally efficient.
I have read a lot of research papers about sentiment classification and related topics. Most of them use 10-fold cross validation to train and test classifiers. That means that no separate testing/validation is done. Why is that?
If we do not use cross-validation (CV) to select one of the multiple models (or we do not use CV to tune the hyper-parameters), we do not need to do separate test. The reason is, the purpose of doing separate test is accomplished here in CV (by one of the k folds in each iteration). Different SE threads have talked about this a lot. You may check.
At the end, feel free to ask me, if something I have written is not clear to you.