I am trying to understand sklearn's function for computing the roc_curve. If I understand correctly, one needs the TPR and FPR to compute ROC. However, sklearn's function takes as input - y_true, y_score.
How does one compute TPR with y_score, when it's a probability estimate of the positive class? That doesn't tell what the model predicted (y_pred); without it, one shouldn't be able to compute the TPR.
Here is a concrete example from their page -
import numpy as np
from sklearn import metrics
y = np.array([1, 1, 2, 2])
scores = np.array([0.1, 0.4, 0.35, 0.8])
fpr, tpr, thresholds = metrics.roc_curve(y, scores, pos_label=2)
As you can see, scores here are probabilities. How does one know whether the model predicted a positive or negative class using it?
scoresarray just contains probabilities. Those probabilities could be used to predict either class. – desert_ranger Nov 15 '23 at 14:26y_scoreis a prediction. As the duplicate explains, you construct a ROC curve by computing TPR and FPR for all thresholds $t$; it is a parametric curve in FPR($t$), TPR($t$). If you only had the predicted class at a single threshold, you wouldn't have a curve, you'd have three points: (0,0), (FPR($t$),TPR($t$)), (1,1). – Sycorax Nov 15 '23 at 14:26