2

I'm working with unbalanced class databases and I'm wondering what's the best practice to retrieve the best threshold cutoff value using roc curve ?

I want the best cutoff to maximize my F1 score.

I already have a roc curve I'm wondering what the best practice to generate the best threshold cutoff.

1 Answers1

3

For any input (threshold) you have one output (F1 score), so, you can try to do a grid search, where you try every possible threshold from 0 to 1 in grid (say, seq(0,1,by=0.01)) and see which number maximize the F1 score.

In addition, the finding the best threshold can be also viewed as a one dimensional optimization problem (without using gradient). You can try optimize in R. Details can be found here. The difference between grid search and optimize is optimize is using a "smarter way" to search, e.g., if we see worse results we will not continue on that direction.

Haitao Du
  • 36,852
  • 25
  • 145
  • 242