0

Cross-validation with metrics such as F1 can be implemented in two ways:

  1. For each cross-validation split, calculate F1_split on the validation dataset.

    F1_result = average_by_splits(F1_split)

  2. For each cross-validaton split, calculate confusion_matrix_split on the validation dataset.

    confusion_matrix_result = sum_over_splits(confusion_matrix_split)

    Calculate F1 from confusion_matrix_result.

Second method is the only possible when using Leave One Out cross-validaton.

And what method is preferrable using k-fold cross validation? Depending on k?

Links to theoretical research papers are welcome.

Updated:

I will reformulate this question:

If we compute some score based on confusion matrix, is it preferrable to

  1. Calculate this score for each split separately and average score over splits
  2. Calculate summary confusion matrix and calculate result score from it
  3. Not to use confusion matrix at all (and what to use in this case?)
  • Preferable is not to use F1 at all, since it suffers from precisely the same problems as accuracy. Rather, use probabilistic predictions, and assess these using proper scoring rules. These can simply be evaluated in each CV validation set and averaged. – Stephan Kolassa Nov 17 '22 at 15:12
  • Updated question. Please, clarify, what metric I need to use? – Arseniy Maryin Nov 17 '22 at 15:42

0 Answers0