1

I am creating a couple of models (RF, SVM, LR) and I want to evaluate all of them on a certain PPV (0.7).

This question and this question helped me write my code:

#Run RF
data_ctrl_null <- trainControl(method="cv", number = 10, classProbs = TRUE, summaryFunction=twoClassSummary, savePredictions=T, sampling=NULL)

rf_model <- train(outcome ~ ., data=htn_df, ntree = 2000, tuneGrid = data.frame(mtry = 69), 
  trControl = data_ctrl_null, method= "rf", preProc=c("center","scale"), metric="AUC",importance=TRUE)

#Create an ROC
myRoc <- roc(predictor = rf_model$pred$affirmatory, response = rf_model$pred$obs, positive = 'affirmatory')

#Find threshold for PPV=0.7
coordinates <- coords(myRoc, x = "all", input = "threshold", ret = c("threshold", "ppv"))
plot(t(coordinates))

But the problem there is no threshold the allows me to get a PPV =0.7; the lowest my PPV goes is 0.7417722 at threshold=Inf and the next lowest PPV is 0.7436548 at threshold= 0.7915000. Below is my plot.

Can someone explain to me conceptually what is going on?

enter image description here

EDIT

From user Calimo's explanation below, I determined that the issue is because of the way my code is interpreting positive and negative cases (should be positive=affirmatory and negative=negatory, or control), which is not what I'm seeing.

myRoc    
Call:
roc.default(response = rf_model$pred$obs, predictor = 
   rf_model$pred$affirmatory)

  Data: rf_model$pred$affirmatory in 102 controls (rf_model$pred$obs 
  affirmatory) > 293 cases (rf_model$pred$obs negatory).
  Area under the curve: 0.9008

Saw that the pROC package actually allows you to set cases and controls. I had to modify myRoc:

myRoc_new <- roc(controls = 
 rf_model$pred$affirmatory[rf_model$pred$obs=="negatory"], 
 cases=rf_model$pred$affirmatory[rf_model$pred$obs=="affirmatory"])

Now I can see that my model supports a PPV from 0.25-1 at various thresholds. However, the coordinates only provides me with near thresholds i.e. threshold= 0.4277500 gives PPV= 0.6981132 and threshold= 0.4312500 gives PPV= 0.7047619. This is so close to PPV=0.7 but can I get the exact threshold?

  • Note that the positive = 'affirmatory' argument to roc will be ignored unless I'm very mistaken. – Calimo May 13 '20 at 07:06
  • Good to know, I just added that because I've seen it for confusion matrices and wanted the roc to note what was positive – PleaseHelp May 13 '20 at 17:05
  • that's a different question really, so you should ask it separately. Stack Overflow would probably be a better fit for that one... – Calimo May 14 '20 at 06:45
  • Sure, I can do that. Thanks for your help with the original question! – PleaseHelp May 14 '20 at 16:02

1 Answers1

2

So first I assume that if you get the lowest PPV with threshold=Inf, your classifier assigns lower values to positive instances.

Look at the formula for the positive predictive value:

$\mathrm{PPV} = \frac {\mathrm{TP}} {\mathrm{TP} + \mathrm{FP}}$

When threshold=Inf, you classify all your values as positive. So $\mathrm{TP} + \mathrm{FP}$ = the total sample size, and $\mathrm{TP}$ = the total number of positive, or:

$\mathrm{PPV} = \frac {\mathrm{Positives}} {\mathrm{Total}}$

which is basically the fraction of actual positives in the dataset. The PPV can never go lower than that.

Calimo
  • 3,664
  • Thanks for the robust explanation, that made a lot of sense! I realized that I was experiencing the issue of my controls and cases being switched , so while my expected PPV at threshold=Inf should have been 0.258, my code was giving me a lowest possible PPV=0.742. I rewrote my roc code with cases/controls instead of predictor/response and now I'm seeing PPVs in the right range. However, can I find the exact threshold for a PPV=0.7 (see edited question)? – PleaseHelp May 13 '20 at 23:55