0

Say I have the following in R

rm(list=ls())
set.seed(1000)
n<-20
x<-rnorm(n, 0.5,1)
y<-rnorm(n, 0.5,1)
type<-rep(2,n)
df1<-data.frame(x,y,type)
x<-rnorm(n, -0.5,1)
y<-rnorm(n, -0.5,1)
type<-rep(5,n)
df0<-data.frame(x,y,type)
df<-merge(x=df0,y=df1,all=T)
plot(df$x,df$y,col=df$type)

enter image description here

Now as you can see I have classes that overlap -- say the cost of classifying the red class incorrectly is 10 times the cost of classifying the blue class incorrectly.

Say I want to use a logistic regression -- how would I do incorporate the cost into my logistic regression model?

Sycorax
  • 90,934
user1172468
  • 2,035
  • 1
    Why would you incorporate the cost in the model at all? Doesn't cost relate to the question of sample design when you are prospectively contemplating obtaining data? If not, then could you elaborate on what these costs actually reflect? – whuber May 20 '16 at 20:11
  • 1
    With a logistic regression classifier, usually one chooses a threshold to dichotomize predictions into positive and negative. Are you asking for a way to incorporate this cost into finding an optimal threshold or are you hoping to obtain another binary classifier altogether? – AdamO May 20 '16 at 20:18
  • @whuber, I guess that was my question -- is the cost function solely employed on the probabilities the model spits out when scoring? – user1172468 May 21 '16 at 23:21

0 Answers0