I am trying to draw an ROC curve for a classifier and wondered to know if there is any other measure for the performance of the classifier than the AUC. And is there any free software that I can use to draw either the histograms or the probability density functions for the noise and the (signal+noise) in one graph?
-
1Overlaying different graphs in R is trivial- you just use the command new = TRUE before you add anything to plotting are. – JenSCDC Nov 28 '14 at 17:31
3 Answers
You cannot make an ROC curve for a classifier. By definition a classifier returns only 2 values when $Y$ is binary. You can create an ROC curve for a continuous predicted value. But to you question there are many approaches such as
- smooth nonparametric calibration curve to show absolute predictive accuracy
- generalized $R^2$ based on log-likelihood
- Brier score - the quadratic error score which is a proper scoring rule like measures based on log-likelihood
IMHO these are all more useful than sensitivity and specificity, which are improper accuracy scoring rules that make up the ROC curve.
- 91,879
- 6
- 178
- 397
-
Why?? as far as I know, an ROC curve could be constructed by using rating method of five discrete points and that's what many papers have mentioned. Once the curve is constructed, you can calculate the AUC or even the distance to the upper left corner as an alternative to the AUC. I am trying to figure out if there is any other measures for the classifier performance depending on ROC curve. – Ali Sultan Nov 26 '14 at 23:59
-
1If you are saying that $Y$ has 5 levels and is ordinal, then you can compute an ROC curve although it will be a crude step function. But if $Y$ is ordinal an ordinal regression approach using maximum likelihood methods is best. For ordinal $Y$ the generalization of ROC area is Somers' $D_{xy}$ rank correlation between the linear predictor and $Y$, and this is a fine measure of pure discrimination. – Frank Harrell Nov 27 '14 at 13:02
There exist several alternative ways for the performance evaluation of different classification methods with different thresholds, e. g.
ROC gap: highest vertical distance between the ROC curve and the angle bisector.
The Lorenz Curve and the corresponding Gini coefficient.
The divergence could also be measured by $ \frac{(m_A - m_B)^2}{\sigma^2_A + \sigma^2_B} $ where $m$ denotes the mean and $\sigma^2$ the variance of group $A$/$B$
Another measure for the discriminatory power is the area where the distributions of both classes intersect
One can also use a two sample t-test or non parametric versions of the Mann-Whitney-U test, Kolmogorow-Smirnow test and Wilcoxon rank sum test. Here, $H_0$ is equal in saying there is no difference between both distributions. Hence, the result of the better classifier will likely get lower p values.
- 2,410
-
How do Lorenz curve and Gini coefficient related to the ROC curve ? – Ali Sultan Nov 30 '14 at 01:21
-
Please have look at page four of this paper: https://eldorado.tu-dortmund.de/bitstream/2003/21648/1/tr30-05.pdf
Here you can see how ROC and Lorenz curves are related. Moreover, when you know the area under the ROC, you can compute the Gini coefficient by $Gini = 2(AUROC – 0.5)$ .
– random_guy Nov 30 '14 at 10:16 -
-
I am not sure how they are exactly related but all boils down to the use of TP, FP, FN and TP. Hence, I think it must be possible to show it with some algebra. ;) – random_guy Nov 30 '14 at 13:34
-
The curve in page 4 of this paper is not an ROC curve; ROC curves start from 0,0 and end in 1,1. – Ali Sultan Nov 30 '14 at 17:07
Scoring rules are very useful in evaluating categorical models. I would rank them higher that AUC because AUC is open to subjectiveness in where you want to cut off the threshold values for the ROC.
For binary models, I prefer the logarithmic rule, mainly because it's the easiest to explain to PHB types, although Briers isn't that far behind.
There are also precision-recall curves and the area under them; these are used instead of ROC when the distribution of the data is highly skewed.
- 1,077
-
1There's no subjectivity in the calculation of AUC because it takes into account all possible thresholds. Its disadvantage is that, as @FrankHarrell says, it only measures discrimination, ignoring calibration entirely: a perfect AUC of 1 can be obtained when a model predicts a higher probability of "success" for any observed "success" than for any "failure", regardless of the accuracy of those probability estimates. – Scortchi - Reinstate Monica Nov 28 '14 at 17:56
-
Let's say we have a binary model where the probability of success is 0.04. How do threshold values above, say 0.5, say anything about the model? Allowing all threshold values to be included is itself a subjective call. – JenSCDC Nov 28 '14 at 19:53
-
1(1) Your example isn't clear at all - is it the predicted or true probabilities that are supposed to be 0.04, & all equal? What does that show? (2) The sense in which AUC takes into account all possible thresholds is that it depends only on how the predicted probabilities order "successes" & "failures": it's equivalent to a rank correlation, & despite often being explained in terms of thresholds doesn't impose any obligation on you to apply thresholds to categorize; it's a scoring rule (albeit improper). – Scortchi - Reinstate Monica Nov 30 '14 at 12:44
-
1(3) You could say the choice to measure a model's predictive performance using AUC is subjective, but why more so than the choice between logarithmic & quadratic scoring rules? For some applications ranking is more important than accurate prediction & that's when AUC tends to be used. – Scortchi - Reinstate Monica Nov 30 '14 at 12:47
-
Good point. I didn't think of subjectivity of choosing a scoring rule. – JenSCDC Nov 30 '14 at 13:06
-
-
Never used it, but a glance at the manual shows it provides a range of performance measures, some of which use cut-offs whereas "the output of
aucis cutoff-independent". – Scortchi - Reinstate Monica Nov 30 '14 at 13:13 -
Should probably add that I'm not recommending the use of ill calibrated models even when discrimination is of primary importance in application. The real value of AUC is in assessing how well a model's doing its job when applied to new data for which calibration is neither desired nor expected. – Scortchi - Reinstate Monica Nov 30 '14 at 13:29