12

Recently I learned about the relationship between Area Under (ROC) Curve and $U$ statistic of the Wilcoxon-Mann-Whitney test. It is supposed to follow the following rule (got it from this nice post on quora: https://www.quora.com/How-is-statistical-significance-determined-for-ROC-curves-and-AUC-values):

$$AUC = \frac{U}{n_1n_2}$$

It looks convincing, but I made some checks on real data in R and I found that, indeed, there is a functional relationship between $U$ and $AUC$, but it has slightly different form:

$$AUC = 1 - \frac{U}{n_1n_2}$$

Unfortunately I cannot share the real data I used, but here is a simple simulation that proves the point:

library(PredictABEL)
set.seed(303)
x1 <- rnorm(40, 20, 4)
x2 <- rnorm(50, 30, 10)
y <- c(rep("a", 40), rep("b", 50))
df <- data.frame(x=c(x1, x2), y=y)
mod <- glm(y ~ x, data=df, family=binomial)
plotROC(df, 2, mod$fitted.values)       # AUC = 0.81
auc <- 0.81
utest <- wilcox.test(x ~ y, data=df)
utest$statistic / prod(table(df$y))     #  = 0.19
1 - utest$statistic / prod(table(df$y)) #  = 0.81 = AUC

So, as you see I am a bit confused. I am pretty sure that this whole confusion is only due to the fact that I am overlooking something important, but that's why I will be really thankful if someone could shed some light on it for me.

EDIT: So the question is which of the two formulas is correct? Every source I check claims that the first one but the data I checked suggest that the second one.

sztal
  • 1,191
  • Welcome to CV. I think it would help your chances for getting a useful response if you were to stipulate your precise question(s), rather than assuming that readers will know what you're asking. – user78229 Apr 12 '16 at 17:08
  • Hey, this is my first question here so thanks for feedback. Hope that the edit made my question more clear. – sztal Apr 12 '16 at 17:35

1 Answers1

12

Ok, I found the answer and as I expected it is trivial. $U$ test statistic value depends on the group it is calculated for (it does not affect the test result in anyway). In the code I wrote the test statistic was computed as a measure of support for the hypothesis that the group with the smaller mean dominates the group with the higher mean, which is of course not true, so that's why $U$ was small.

So after switching the direction of the comparison and making the hypothesis tested by the Wilcoxon-Mann-Whitney test to one checking whether the group with the higher mean dominates the one with the lower, which is true, I got the correct relationship between $U$ and $AUC$ (that is $AUC = \frac{U}{n_1n_2}$). So everything is correct.

sztal
  • 1,191