Divergence of accuracies and overfitting in AdaBoost

Question

I implemented a binary classification setup of AdaBoost, but I train one model for each label in a one-vs-all arrangement, and in the prediction time I choose the class corresponding to the model with the highest return value. Using 5-fold cross-validation with two distinct random states, this is the accuracy graph:

My question is that from which point we can speak of overfitting? The criterium should be "divergence of training and validation accuracy" (i.e., 2000) or "decrease of validation accuracy" (i.e., 8000)?

There are still going strong, 8000 indeed your inflection point. — usεr11852, Dec 20 '21 at 10:46

usεr11852 · Answer 1 · 2022-10-31T10:25:07.973

Putting aside the compulsory reading on: "*https://stats.stackexchange.com/questions/312780/why-is-accuracy-not-the-best-measure-for-assessing-classification-models*", (e.g. multi-class AUC-ROC probably is a more sound option) this learner appears to start over-fitting at approximately 8000 iteration indeed. It would probably be better to use repeated $K$-fold cross-validation instead a single $K$-fold run to smooth out a bit of the sampling variance but aside from that this work seems to be in the clear.

Divergence of accuracies and overfitting in AdaBoost

1 Answers1