0

I am trying to understand sklearn's function for computing the roc_curve. If I understand correctly, one needs the TPR and FPR to compute ROC. However, sklearn's function takes as input - y_true, y_score.

How does one compute TPR with y_score, when it's a probability estimate of the positive class? That doesn't tell what the model predicted (y_pred); without it, one shouldn't be able to compute the TPR.

Here is a concrete example from their page -

import numpy as np
from sklearn import metrics
y = np.array([1, 1, 2, 2])
scores = np.array([0.1, 0.4, 0.35, 0.8])
fpr, tpr, thresholds = metrics.roc_curve(y, scores, pos_label=2)

As you can see, scores here are probabilities. How does one know whether the model predicted a positive or negative class using it?

  • Did you read the proposed duplicate? What remains unclear? – Dave Nov 15 '23 at 14:18
  • The proposed duplicate has the prediction values, whereas I don't. "For example, in the validation dataset, I have the true value for the dependent variable, retention (1 = retained; 0 = not retained), as well as a predicted retention status for each observation generated by my regression analysis using a model that was built using the training set (this will range from 0 to 1)." Sorry, if I'm missing something. – desert_ranger Nov 15 '23 at 14:21
  • The scores array just contains probabilities. Those probabilities could be used to predict either class. – desert_ranger Nov 15 '23 at 14:26
  • So how do you go from the predictions on a continuum to categorical predictions for which you can calculate sensitivity and specificity? – Dave Nov 15 '23 at 14:26
  • y_score is a prediction. As the duplicate explains, you construct a ROC curve by computing TPR and FPR for all thresholds $t$; it is a parametric curve in FPR($t$), TPR($t$). If you only had the predicted class at a single threshold, you wouldn't have a curve, you'd have three points: (0,0), (FPR($t$),TPR($t$)), (1,1). – Sycorax Nov 15 '23 at 14:26
  • @Dave We typically use a threshold. So, if the probability>0.5, it predicts the positive class otherwise negative class gets predicted. – desert_ranger Nov 15 '23 at 14:28
  • 1
    @Sycorax Oh, wait. I think I am beginning to understand. I think I should do more reading. – desert_ranger Nov 15 '23 at 14:29
  • @Dave I am beginning to understand Sycorax was telling me. I think I need to do more reading. – desert_ranger Nov 15 '23 at 14:30

0 Answers0