4

I've been learning about the relationship between Mann-Whitney U.

Supposedly, the area under the ROC curve should be $\text{AUC} = \frac{U}{n_0 n_1}$, where $U$ is the Mann-Whitney statistic, $n_0$ is the number of negative examples, and $n_1$ is the number of positive examples.

I attempted to test this using python's scipy and scikitlearn library and found some unaccounted discrepancies.

Unfortunately, I can't share the data, but here's the code and output.

U = mannwhitneyu(preds['score'], preds['truth'])[0]
vc = preds['truth'].value_counts()
n0n1 = vc.loc[0] * vc.loc[1]

print('U: %d' % U) print('n0n1: %d' % n0n1) print('U/n0n1: %0.3f' % (U/n0n1)) print('AUC: %0.3f' % roc_auc_score(preds['truth'], preds['score']))

output:

U: 26899093155
n0n1: 40496604804
U/n0n1: 0.664
AUC: 0.674

However, when I using the implementation described in the above link:

def calc_U(y_true, y_score):
    n1 = np.sum(y_true==1)
    n0 = len(y_score)-n1
## Calculate the rank for each observation
# Get the order: The index of the score at each rank from 0 to n
order = np.argsort(y_score)
# Get the rank: The rank of each score at the indices from 0 to n
rank = np.argsort(order)
# Python starts at 0, but statistical ranks at 1, so add 1 to every rank
rank += 1

# If the rank for target observations is higher than expected for a random model,
# then a possible reason could be that our model ranks target observations higher
U1 = np.sum(rank[y_true == 1]) - n1*(n1+1)/2
U0 = np.sum(rank[y_true == 0]) - n0*(n0+1)/2

# Formula for the relation between AUC and the U statistic
AUC1 = U1/ (n1*n0)
AUC0 = U0/ (n1*n0)

return U1, AUC1, U0, AUC0

gives me the correct equality $\text{AUC} = \frac{U}{n_0 n_1}$.

I've attempted applying the solution described here, but this is not resolving my issue. I'm wondering if it could have anything to do with these particular implementations with the functions.

Sycorax
  • 90,934
LogCapy
  • 105
  • 6
  • 2
    not sure if the open issues on Mann-Whitney address the issue you are having. It is difficult to say without data for a reproducible example. Can you provide mocked or simulated data which illustrate? Also you may find it helpful to check the open issues on the scipy github repo to see if the open issues are impacting you (i'd guess they do but can't be sure without an MRE). Github link: https://github.com/scipy/scipy/issues?q=is%3Aissue+is%3Aopen+mann+whitney+ – Lucas Roberts Apr 28 '20 at 03:15

2 Answers2

3

I believe Lukas Roberts is correct that this is an open issue in scipy. The current implementation reverses the definitions of U1 and U2.

The problem is in these lines:

u1 = n1*n2 + (n1*(n1+1))/2.0 - np.sum(rankx, axis=0)  # calc U for x
u2 = n1*n2 - u1  # remainder is U for y

When we place the definition of u1 in u2, we see that the formula for u2 used here is in fact the formula for u1 from Wikipedia:

u2 = n1*n2 - n1*n2 + (n1*(n1+1))/2.0 - np.sum(rankx)    

There is an complete revision in the works, but the pull request seems to be stuck for now.

Johannes
  • 426
0

Please check the case where it has the tied value.

=======================================

Assign numeric ranks to all the observations (put the observations from both groups to one set), beginning with 1 for the smallest value. Where there are groups of tied values, assign a rank equal to the midpoint of unadjusted rankings. E.g., the ranks of (3, 5, 5, 5, 5, 8) are (1, 3.5, 3.5, 3.5, 3.5, 6) (the unadjusted rank would be (1, 2, 3, 4, 5, 6)).

from https://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test#Calculations