Cross-validation and F1 metric

Question

Cross-validation with metrics such as F1 can be implemented in two ways:

For each cross-validation split, calculate F1_split on the validation dataset.

F1_result = average_by_splits(F1_split)
For each cross-validaton split, calculate confusion_matrix_split on the validation dataset.

confusion_matrix_result = sum_over_splits(confusion_matrix_split)

Calculate F1 from confusion_matrix_result.

Second method is the only possible when using Leave One Out cross-validaton.

And what method is preferrable using k-fold cross validation? Depending on k?

Links to theoretical research papers are welcome.

Updated:

I will reformulate this question:

If we compute some score based on confusion matrix, is it preferrable to

Preferable is not to use F1 at all, since it suffers from precisely the same problems as accuracy. Rather, use probabilistic predictions, and assess these using proper scoring rules. These can simply be evaluated in each CV validation set and averaged. — Stephan Kolassa, Nov 17 '22 at 15:12
Updated question. Please, clarify, what metric I need to use? — Arseniy Maryin, Nov 17 '22 at 15:42

0 Answers0