2

Say I fit a logistic classifier on a supervised dataset with binary labels. If I select a threshold of decision of 0.5, which assumption am I implicitly making? Is there any situation where 0.5 makes sense?

This page suggests that we should always tune the decision threshold to optimize some target metric of interest. So intuitively, my guess is that 0.5 only makes sense if the metric of interest is the accuracy AND the class probabilities are equal AND the misclassification costs are equal. In any other case, a threshold of 0.5 should not be used. Is this correct?

usual me
  • 1,227
  • 2
  • 10
  • 17

1 Answers1

1

I would say that a threshold of 0.5 should not be blindly used. It can be optimal under certain circumstances, just as any other threshold in (0,1) can be optimal under other circumstances. There is no reason to tune a threshold but forbid the one value 0.5.

(And of course, I maintain that it is very often the case that we don't have two possible courses of action, but more, even if there are two underlying classes. A clinical test that yields a low probability of me having a certain disease can lead to the action to do nothing. A medium probability could lead to doing more tests. A high probability could lead to treatment starting immediately. So it makes really no sense to train a single threshold.)

Stephan Kolassa
  • 123,354