Is decision threshold a hyperparameter in logistic regression?

Question

Predicted classes from (binary) logistic regression are determined by using a threshold on the class membership probabilities generated by the model. As I understand it, typically 0.5 is used by default.

But varying the threshold will change the predicted classifications. Does this mean the threshold is a hyperparameter? If so, why is it (for example) not possible to easily search over a grid of thresholds using scikit-learn's GridSearchCV method (as you would do for the regularisation parameter C).

"As I understand it, typically 0.5 is used by default." Depends on the meaning of the word "typical". In practice, no one should be doing this. — Matthew Drury, Jan 31 '19 at 17:27
Strictly you don't mean logistic regression, you mean using one logistic regressor with a threshold for binary classification (you could also train one regressor for each of the two classes, with a little seeded randomness or weighting to avoid them being linearly dependent). — smci, Jan 31 '19 at 19:53

Matthew Drury · Answer 1 · 2019-02-01T16:52:57.640

But varying the threshold will change the predicted classifications. Does this mean the threshold is a hyperparameter?

Yup, it does, sorta. It's a hyperparameter of you decision rule, but not the underlying regression.

If so, why is it (for example) not possible to easily search over a grid of thresholds using scikit-learn's GridSearchCV method (as you would do for the regularisation parameter C).

This is a design error in sklearn. The best practice for most classification scenarios is to fit the underlying model (which predicts probabilities) using some measure of the quality of these probabilities (like the log-loss in a logistic regression). Afterwards, a decision threshold on these probabilities should be tuned to optimize some business objective of your classification rule. The library should make it easy to optimize the decision threshold based on some measure of quality, but I don't believe it does that well.

I think this is one of the places sklearn got it wrong. The library includes a method, predict, on all classification models that thresholds at 0.5. This method is useless, and I strongly advocate for not ever invoking it. It's unfortunate that sklearn is not encouraging a better workflow.

I also share your skepticism of the predict method's default choice of 0.5 as a cutoff, but GridSearchCV accepts scorer objects which can tune models with respect to out-of-sample cross-entropy loss. Am I missing your point? — Sycorax, Jan 31 '19 at 17:32
Right, agreed that is best practice, but it doesn't encourage users to tune decision thresholds. — Matthew Drury, Jan 31 '19 at 17:32

Is decision threshold a hyperparameter in logistic regression?

1 Answers1

Linked