I would like to ask a question on how to interpret the results of two different models, based on AUROC and F1 metrics. As all of you know, AUROC calculates the area under the ROC curve, and the F1 score is the harmonic mean of recall and precision.
While both of them are used for classification metrics, I wonder how should I interpret the below 2 model prediction performance.
model 1: AUROC: 72.28, F1: 60.89
model 2: AUROC: 87.44, F1: 46.11
My question is, just looking at the above model results, is it possible to compare them? If yes, what is the best explanation for them?