I've trained a simple NN to perform binary classification with goal of maximizing area under ROC curve. Right now AUC is around 0.85. Out of curiosity, I checked which thresholds are best in terms of maximizing f1_score. It turned out that optimal thresholds are around 0.08 < 0.1, corresponding to fpr = 0.22 and tpr = 0.72 and f1_score = 0.75. Note that training and evaluation datasets are inbalanced with ~90% of negative and 10% of positive samples.
I am wondering what does it mean about the data or the model if f1-optimal threshold is so low and how can I use that knowledge to improve my model. My initial guess was that low threshold is result of unbalanced classes - because only 10% of samples is positive, it makes sense to be very sensitive and classify something as positive even with low certainty, but then I realized it should be the opposite.
0label. Then there are no positive predictions, FPR and TPR are both 0. And if I classify everything as1TPR and FPR are both 1 which seems wrong, I'll look for bugs in code later. – Brzoskwinia Mar 20 '23 at 16:13