2

I have a logistic model fitted with the following R function:

glmfit<-glm(formula, data, family=binomial)

A reasonable cutoff value in order to get a good data classification (or confusion matrix) with the fitted model is 0.2 instead of the mostly used 0.5.

And I want to use the cv.glm function with the fitted model:

cv.glm(data, glmfit, cost, K)

Since the response in the fitted model is a binary variable an appropriate cost function is (obtained from "Examples" section of ?cv.glm):

cost <- function(r, pi = 0) mean(abs(r-pi) > 0.5)

As I have a cutoff value of 0.2, can I apply this standard cost function or should I define a different one and how?

Thank you very much in advance.

Sycorax
  • 90,934

2 Answers2

1

You can simply do:

cost <- function(r, pi = 0) mean(abs(r-pi) > 0.2)

The logic follows:

  1. If your cutoff is 0.2, then predict an outcome of 1 if pi is greater than 0.2.
  2. Therefore, the number of times you are wrong is given by summing the logical vector

    abs(r-pi) > 0.2
    

    We can arrive at this by looking at both cases where the prediction is wrong:

    if r = 0 and pi > 0.2
    if r = 1 and pi <= 0.2
    

    In both cases, abs(r - pi) > 0.2 will return the value TRUE, meaning that the prediction is wrong.

Alex
  • 4,382
  • 4
  • 34
  • 57
  • The cutoff comes from the cost function, not vice versa. And the only way a cutoff exists is for the cost function to be identical across all units. – Frank Harrell Sep 20 '19 at 11:43
1

OK, No answers to my post. But I think I got the answer. All credits go to @Feng Mai. He wrote a post here: What is the cost function in cv.glm in R's boot package? and thanks to it here is my answer to my question:

For a cutoff value of 0.2, I think that I could I apply the following cost function:

 mycost <- function(r, pi){
 weight1 = 1 #cost for getting 1 wrong
 weight0 = 1 #cost for getting 0 wrong
 c1 = (r==1)&(pi<0.2) #logical vector - true if actual 1 but predict 0
 c0 = (r==0)&(pi>0.2) #logical vecotr - true if actual 0 but predict 1
 return(mean(weight1*c1+weight0*c0))
 }

And then I would use the cv.glm function with the fitted model and mycost function:

cv.glm(data, glmfit, cost=mycost, K)

Hopefully this might work. Am I right?

  • 2
    I think that it is not proper to do this unless the cost function has been specified from subject matter experts. It is not a statistical quantity, and often varies with subjects. – Frank Harrell Jan 30 '14 at 13:19