Can someone explain what area under the curve means for someone with absolutely no stats knowledge? For example, if a model claims an AUC of 0.9, does that mean that it makes an accurate prediction 90% of the time?
4 Answers
AUC is difficult to understand and interpret even with statistical knowledge. Without such knowledge I'd stick to the following stylized facts:
- AUC close to 0.5 means a model performance wasn't better than randomly classifying subjects. It wasn't better than a silly random number generator to mark the samples as positive and negative.
- AUC is used by some to compare models.
- Higher AUC suggests better demonstrated performance in classification.
- AUC is a noisy metric
- Max AUC is 1, for a classification model that is never wrong
- Although technically Min AUC is 0, it makes little sense to have AUC lesser than 0.5. AUC zero means that by a simple switch from positive to negative label you get to a perfect classification
- 61,310
To keep things reasonably simple, an AUC of 0.9 would mean that if you randomly picked one person/thing from each class of outcome (e.g., one person with the disease and one without), there is a 90% chance that the one from the class of interest (the group being modelled, here those with the disease) has the higher value (or this could be a lower value if the thing of interest was associated with the reference or default class).
So if the AUC for predicting "being male" versus "being female" using height was 0.9, this would mean that if you took a random male and a random female, 90% of the time, the male would be taller.
- 398
-
Came here to give this answer. Sure, it's an area under a curve, but you don't need to understand that curve to give this explanation. I'd only add that the scale of AUC is practically from 0.5 to 1 - if you can't even get a 50% chance then your model is worse than random guessing. – Michael Lugo Oct 13 '21 at 14:30
-
Are you sure you’re not talking about 1-specificity? AUC is difficult to pin to ratios and percentages – Aksakal Oct 13 '21 at 19:11
-
@Aksakal A proof, more complex than was asked for, of this relationship is given here – user215517 Oct 13 '21 at 20:44
-
@user215517 the answer talks about AUC being proportional. Also when constructing ROC you run through all threshold, so the meaning of “90% of time” needs to be defined – Aksakal Oct 13 '21 at 20:51
-
@Aksakal Perhaps we're reading things differently, but from the question linked to: "The second [interpretation] is that the AUC of a classifier is equal to the probability that the classifier will rank a randomly chosen positive example higher than a randomly chosen negative example, i.e. P(score(x+)>score(x−))." which is what I was (hoping I was) saying above. "90% of the time" has a standard frequentist interpretation here. The answer that says "AUC is proportional to the number of correctly ordered pairs" is saying the same thing. The proportion of correctly ordered pairs is the AUC. – user215517 Oct 13 '21 at 21:09
-
@Aksakal The probabilistic interpretation is standard. It follows from a simple change of variable. – Hasse1987 Oct 14 '21 at 00:00
-
1@MichaelLugo If your AUC is less than 0.5, then you should pursue the Constanza method. – Acccumulation Oct 14 '21 at 19:51
A classifier is a criterion to assign an individual to a category ("positive" or "negative") depending to some of its characteristics.
Some classifiers will provide each individual with a number between $0$ and $1$, with $0$ being "totally sure it's negative" and 1 being "totally sure it's positive". We usually take $0.5$ as the threshold between what we take as "positive" and what we take as "negative", but this is not always the case.
Taking a low threshold will result in more true positives but also more false ones. Taking a higher threshold will reduce the number of false positives, but we'll also leave as negative some of the cases that where actually positive (thus less true postiives as well). So in the end, since no classifier is perfect, it will be a compromise between the two.
Each point in the ROC curve represents the rates of true and false positives for each of the possible thresholds we could choose. The AUC is the area below that curve. A high AUC indicates that the model can get a good FPR (false positive rate) without losing too much TPR (ture positive rate) and vice-versa.(Note that the area below the ROC curve will be big if you get a high TPR already for an FPR close to 0).
SIMPLIFIED EXAMPLE: let's say you want to use a person's height to determine whether they're a man or a woman. Your classifier will choose some height $X$ and predict that everyone above height $X$ is male and everyone below it is female.
If you choose a very high $X$, like $1.90$m, you will hardly ever mislabel a woman as male, but you will also "miss" many men. On the other hand, if you pick a low $X$ like $1.50$m, you will correctly identify almost all men, but you will also classify a lot of women as male. For each $X$ you can choose, you'll get different true and false positive ratios, but it's ultimately a kind of arbitrary choice depending of what type of error worries you the most.
In this context, we could plot the ROC curve with the different TPRs and FPRs, then the AUC would give us an idea of how good of a classifier we can hope to get using height (as opposed to some other classifier we could have thought of using something like weight, age, blood pressure...). (See user215517's answer)
- 2,596
Following up on the comment from @Nuclear Hoagie, the ROC curve for a model is generated by evaluating classifiers using a sequence of thresholds for declaring positive or negative. The AUC represents the area under the curve over the entire range of possible thresholds. Often, only a restricted range of thresholds is really of interest. When this is the case, AUC may not be the best way to compare models.
- 1
This is only true for balanced data. For imbalanced data - i.e., most datasets - a classifier with an AUC of 0.5 can be much better than random. Consider data with prevalence (proportion of positive class) of 0.7. A classifier which assigns probability > 0.5 to each sample has an AUROC of 0.5. Yet it has an accuracy of 70%, much better than the random classifier, which has 50% accuracy by definition.
– ljubomir Oct 19 '21 at 23:32