In R how to you modify a logistic regression where the cost of selecting one of the classes is much higher than the other.

Question

Say I have the following in R

rm(list=ls())
set.seed(1000)
n<-20
x<-rnorm(n, 0.5,1)
y<-rnorm(n, 0.5,1)
type<-rep(2,n)
df1<-data.frame(x,y,type)
x<-rnorm(n, -0.5,1)
y<-rnorm(n, -0.5,1)
type<-rep(5,n)
df0<-data.frame(x,y,type)
df<-merge(x=df0,y=df1,all=T)
plot(df$x,df$y,col=df$type)

Now as you can see I have classes that overlap -- say the cost of classifying the red class incorrectly is 10 times the cost of classifying the blue class incorrectly.

Say I want to use a logistic regression -- how would I do incorporate the cost into my logistic regression model?

Why would you incorporate the cost in the model at all? Doesn't cost relate to the question of sample design when you are prospectively contemplating obtaining data? If not, then could you elaborate on what these costs actually reflect? — whuber, May 20 '16 at 20:11
With a logistic regression classifier, usually one chooses a threshold to dichotomize predictions into positive and negative. Are you asking for a way to incorporate this cost into finding an optimal threshold or are you hoping to obtain another binary classifier altogether? — AdamO, May 20 '16 at 20:18
@whuber, I guess that was my question -- is the cost function solely employed on the probabilities the model spits out when scoring? — user1172468, May 21 '16 at 23:21

In R how to you modify a logistic regression where the cost of selecting one of the classes is much higher than the other.

0 Answers0