When do we need cross validation? It's a lack of training data or choose different models? What is the background of the cross validation? What is the target of the cross validation?
2 Answers
Cross Validation is a compromise when you do not have a lot of data or cannot afford to split your data further. The idea is to simulate having many independent data sets which you can use to perform the standard steps used in statistics / machine learning, that is training, validation and testing. It would be great if you had actual real data that you could use for all these steps, but often times you do not. In conclusion, if we had enough data for all these steps we would have no need for CV. If, however, we are tight on data, then CV works "well enough" usually, it provides good approximations (with the right implementation).
- 7,813
This also eliminates the probability of overfitting since the comparison of performance in 1 training set VS 5 sets can easily show you that your model is robust if you get the same evaluation metrics and accuracy in the 5-Fold case.
- 13