How to incorporate costs (into logit model) of false positive, false negative, true positive, true negative if they are different costs?

Question

How to incorporate costs (into logit model) of false positive, false negative, true positive, true negative responses, if they are different costs ? Is it possible to do that on the level of likelihood function ?

Edition : I see know that likeligooe function could be quite easily modified to incorporate costs, but then likelihood function become discontinuous :

y_i == 1 oraz f(x_i*B) > 0.5 cost = cost11
y_i == 1 oraz f(x_i*B) < 0.5 cost = cost10
y_i == 0 oraz f(x_i*B) > 0.5 cost = cost01
y_i == 0 oraz f(x_i*B) < 0.5 cost = cost00

non-modified likelihood function :
f(x_i*B)^y_i * ( 1 - f(x_i*B) ) ^ (1 - y_i)

modified likelihood function :

(cost11*positive(f(x_i*B)-0.5) + cost10*negative(f(x_i*B)-0.5) ) ^ y_i
*
(cost00*negative(f(x_i*B)-0.5) + cost01*positive(f(x_i*B)-0.5) ) ^ (1 - y_i)

where : positive(x) = 1 if x > 0
positive(x) = 0 if x < 0

negative(x) = 1 if x > 0
negative(x) = 0 if x < 0

I've seen quantile regression suggested for this (when false positive should cost more than false negative etc.), see Asymmetric Loss Functions for Forecasting in Criminal Justice Settings by Richard Berk (2011). Pre-print here. — Andy W, May 15 '13 at 17:37

score 5 · Answer 1 · answered May 15 '13 at 14:11

This is best thought of, in my opinion, not of a change to the likelihood function, but as a way to translate estimated risk into an optimum decision. We maximize the (standard or standard penalized) likelihood for a reason - to get optimal models. Then optimal decisions are made one individual at a time based on that individual's loss function. As typically loss functions vary from subject to subject, the final decision has to be deferred and cannot be made by an analyst. Most commonly, the loss function is not articulated but is used implicitly by the subject to make her own decision. It will depend on deeply held beliefs as well as prevailing conditions (e.g., resource availability).

score 3 · Answer 2 · edited Apr 13 '17 at 12:44

There are a few approaches that might get you moving in the right direction. First you could train your logistic regression via iterative learning and this would keep the solution confined to employing the likelihood function to estimate the regression parameters. First, estimate the logistic regression and look at the errors it made. If you want to improve the false positive rate, then assign a higher weight to the observations that the model incorrectly identified as belonging to the positive class. Then rerun the logistic regression using the weighted observations. Do this iteratively until the false positive rate satisfies your objective. See the post on Case weighted logistic regression for a discussion on implementing case weighted logistic regression in R. To be honest, I am not sure how well this approach will work in general, and it is not really an efficient optimization method. It is, however, simple to implement and easy to try.

If you are willing to move outside of relying solely on the likelihood function to control classifications, then you could always optimize the threshold parameter used to partition the observations into negative and positive classes. I believe this threshold parameter is defaulted to 0.5, with any output less than 0.5 being classified as a negative class and any output above 0.5 being classified as a positive class. After the logistic regression weights are estimated you could move onto optimizing this threshold parameter using the true negative rate, true positive rate, precision, recall, F-measure, ect. as your objective function. Practical Neural Network Recipes in C++ has a decent discussion of augmenting the classification threshold to control different error rates (the discussion is about neural nets, but it applies to logistic regression models as well). A variety of optimization techniques could be used to solve this problem.

This weighted refitting approach causes a number of problems. Maximum likelihood estimation is there for a reason. And the artificial threshold approach yields suboptimal decisions. Threshold parameters are not optimized; they are dictated by the cost/loss/utility function for the decision problem. — Frank Harrell, Dec 06 '15 at 13:09

How to incorporate costs (into logit model) of false positive, false negative, true positive, true negative if they are different costs?

2 Answers2