Relationship between AUC and U Mann-Whitney statistic

Question

Recently I learned about the relationship between Area Under (ROC) Curve and $U$ statistic of the Wilcoxon-Mann-Whitney test. It is supposed to follow the following rule (got it from this nice post on quora: https://www.quora.com/How-is-statistical-significance-determined-for-ROC-curves-and-AUC-values):

$$AUC = \frac{U}{n_1n_2}$$

It looks convincing, but I made some checks on real data in R and I found that, indeed, there is a functional relationship between $U$ and $AUC$, but it has slightly different form:

$$AUC = 1 - \frac{U}{n_1n_2}$$

Unfortunately I cannot share the real data I used, but here is a simple simulation that proves the point:

library(PredictABEL)
set.seed(303)
x1 <- rnorm(40, 20, 4)
x2 <- rnorm(50, 30, 10)
y <- c(rep("a", 40), rep("b", 50))
df <- data.frame(x=c(x1, x2), y=y)
mod <- glm(y ~ x, data=df, family=binomial)
plotROC(df, 2, mod$fitted.values)       # AUC = 0.81
auc <- 0.81
utest <- wilcox.test(x ~ y, data=df)
utest$statistic / prod(table(df$y))     #  = 0.19
1 - utest$statistic / prod(table(df$y)) #  = 0.81 = AUC

So, as you see I am a bit confused. I am pretty sure that this whole confusion is only due to the fact that I am overlooking something important, but that's why I will be really thankful if someone could shed some light on it for me.

EDIT: So the question is which of the two formulas is correct? Every source I check claims that the first one but the data I checked suggest that the second one.

Welcome to CV. I think it would help your chances for getting a useful response if you were to stipulate your precise question(s), rather than assuming that readers will know what you're asking. — user78229, Apr 12 '16 at 17:08
Hey, this is my first question here so thanks for feedback. Hope that the edit made my question more clear. — sztal, Apr 12 '16 at 17:35

score 12 · Accepted Answer · answered Apr 13 '16 at 09:49

12

Ok, I found the answer and as I expected it is trivial. $U$ test statistic value depends on the group it is calculated for (it does not affect the test result in anyway). In the code I wrote the test statistic was computed as a measure of support for the hypothesis that the group with the smaller mean dominates the group with the higher mean, which is of course not true, so that's why $U$ was small.

So after switching the direction of the comparison and making the hypothesis tested by the Wilcoxon-Mann-Whitney test to one checking whether the group with the higher mean dominates the one with the lower, which is true, I got the correct relationship between $U$ and $AUC$ (that is $AUC = \frac{U}{n_1n_2}$). So everything is correct.

answered Apr 13 '16 at 09:49

sztal

1,191

Fab! This is exactly what I was looking for. Many thanks for your result. – Arnold Klein Dec 01 '17 at 11:46
Hi, can you share the code with the switching direction? – lola Oct 28 '22 at 13:18
Hi, just change the order of comparison, i.e. instead of group A vs group B, do group B vs group A. – sztal Nov 06 '22 at 14:19

Relationship between AUC and U Mann-Whitney statistic

1 Answers1

Linked