What is the procedure to find the optimal decision threshold in an imbalanced classification problem to maximize the F1 score? I'm using an xgboost model. Your help is highly appreciated.
Asked
Active
Viewed 115 times
1 Answers
3
Find a good probabilistic model (by optimizing a proper scoring rule). Then vary your threshold to optimize F1 - this is a straightforward optimization problem, and bisection search will work well.
Think about whether optimizing F1 actually makes sense. See Reduce Classification Probability Threshold and Profusion of threads on imbalanced data - can we merge/deem canonical any? and the links there.
Stephan Kolassa
- 123,354