1

I would like to ask for some details explanation on comparing several classifiers for imbalanced dataset using the following metrics:

  • Area under the ROC curve, AUC
  • Area under the Precision-Recall curve, AUPRC
  • Recall
  • Precision
  • F1 Score

As my data are highly imbalanced, minority class ~0.02%, I don't focus much on the AUC as it could not provide a full picture. I include AUPRC, Recall, Precision, and F1. I read a comparison of AUC and AUPRC here.

For example in the below Table (bold values are the best ones): \begin{array}{lccccc|c} & AUC & AUPRC & Recall & Precison & F1 & Rank\\ \hline C1 & 0.933 & 0.745 & 0.581 & \textbf{0.908} & 0.680 & 3.8\\ C2 & \textbf{0.949} & \textbf{0.750} & 0.882 & 0.067 & 0.123 & 3.2\\ C3 & 0.944 & 0.750 & 0.876 & 0.062 & 0.113 & 4.4\\ C4 & 0.941 & 0.730 & \textbf{0.901} & 0.018 & 0.035 & 5.2\\ C5 & 0.940 & 0.637 & 0.502 & 0.777 & 0.501 & 5.6 \\ C6 & 0.901 & 0.631 & 0.564 & 0.643 & 0.444 & 6.4 \\ C7 & 0.942 & 0.723 & 0.803 & 0.500 & 0.583 & 4.2\\ C8 & 0.948 & 0.717 & 0.642 & 0.852 & \textbf{0.710} & 3.2\\ \hline \end{array}

What I don't really get is as I focus on positive samples detection rate, I find C5,6,7,8 are more balancing toward Recall and Precision and hence, result in better F1. However, for them, the AUPRCs are lower compared with C1,2,3,4.

I compute AUPRC via Average Precision using scikit-learn. The final rank column is the average rank across all metrics.

0 Answers0