1

I implemented a binary classification setup of AdaBoost, but I train one model for each label in a one-vs-all arrangement, and in the prediction time I choose the class corresponding to the model with the highest return value. Using 5-fold cross-validation with two distinct random states, this is the accuracy graph:

training and validation accuracies

My question is that from which point we can speak of overfitting? The criterium should be "divergence of training and validation accuracy" (i.e., 2000) or "decrease of validation accuracy" (i.e., 8000)?

1 Answers1

1

Putting aside the compulsory reading on: "*https://stats.stackexchange.com/questions/312780/why-is-accuracy-not-the-best-measure-for-assessing-classification-models*", (e.g. multi-class AUC-ROC probably is a more sound option) this learner appears to start over-fitting at approximately 8000 iteration indeed. It would probably be better to use repeated $K$-fold cross-validation instead a single $K$-fold run to smooth out a bit of the sampling variance but aside from that this work seems to be in the clear.

usεr11852
  • 44,125