So I just learned about AUROC. When I read this thread, it seems like AUROC is not a great metric for imbalanced dataset. One answer even says it shouldn't be used to compare models.
However, I am confused because research papers use AUROC to test models for MIMIC-III code prediction, which is a highly imbalanced dataset. The papers doesn't clearly explain why they picked such metric.
My questions are
- Why do you think the authors picked AUROC? Is there an pro for using it that I am unaware of?
- Should the SOTA model be MSMN(Highest AUROC) or RAC(Highest F1)? This is confusing because the site is sorted by AUROC on default.