0

I've trained a simple NN to perform binary classification with goal of maximizing area under ROC curve. Right now AUC is around 0.85. Out of curiosity, I checked which thresholds are best in terms of maximizing f1_score. It turned out that optimal thresholds are around 0.08 < 0.1, corresponding to fpr = 0.22 and tpr = 0.72 and f1_score = 0.75. Note that training and evaluation datasets are inbalanced with ~90% of negative and 10% of positive samples.

I am wondering what does it mean about the data or the model if f1-optimal threshold is so low and how can I use that knowledge to improve my model. My initial guess was that low threshold is result of unbalanced classes - because only 10% of samples is positive, it makes sense to be very sensitive and classify something as positive even with low certainty, but then I realized it should be the opposite.

  • 1
    I think this is exactly what I discuss in my question here. – Dave Mar 20 '23 at 15:52
  • 1
    why is maximising the F1 score appropriate for your application? – Dikran Marsupial Mar 20 '23 at 15:55
  • 1
    @DikranMarsupial It isn't, I just checked out of curiosity. I tried also minimizing distance to (0,1) corner and got similar threshold. In general, the task is from old Kaggle competiton and the only metric they used to evaluate models were AUC ROC. – Brzoskwinia Mar 20 '23 at 16:01
  • What kind of FPR, TPR, and $F_1$ score do you get when you predict every observation to be the majority category? – Dave Mar 20 '23 at 16:03
  • @Dave majority category is this case 0 label. Then there are no positive predictions, FPR and TPR are both 0. And if I classify everything as 1 TPR and FPR are both 1 which seems wrong, I'll look for bugs in code later. – Brzoskwinia Mar 20 '23 at 16:13
  • 1
    @Brzoskwinia thanks for clarifying. I think a lot of the misinformation about imbalanced datasets is caused by not having a well thought out performance metric for the application. I would experiment with coming up with some misclassification costs for the task and see how to adjust the threshold to minimise the expected loss. IMHO if you don't know the misclassification cost, you are better off avoiding performance metrics that depend on the threshold as the threshold is really a statement about the misclassification costs. Just leaving it as AUROC or a proper scoring rule is better. – Dikran Marsupial Mar 20 '23 at 16:34

0 Answers0