2

Much like the two-sample (equal variance, not Welch) t-test is a special case of ANOVA, and ANOVA is a special case of linear regression, the Wilcoxon Mann-Whitney U test is a special case of Kruskal-Wallis, and Kruskal-Wallis is a special case of proportional odds logistic regression. Let's simulate that claim about Wilcoxon, KW, and proportional odds logit.

library(rms)
library(MASS)
set.seed(2021)
N <- 25
B <- 100
p1 <- p2 <- p3 <- p4 <- rep(NA, B)
for (i in 1:B){

a <- rnorm(N) b <- rnorm(N, 1) y <- c(a, b) x <- c(rep(0, length(a)), rep(1, length(b))) L <- rms::orm(y ~ x) # Proportional odds logit # model from Frank Harrell's RMS package

p1[i] <- wilcox.test(a, b)$p.value

p2[i] <- L$stats[7] # There are two p-values in the output p3[i] <- L$stats[9] # There are two p-values in the output

p4[i] <- kruskal.test(y, x)$p.value
}

d <- data.frame(Wilcoxon = p1, ORM_1 = p2, ORM_2 = p3, KW = p4) plot(d) cor(d)

Wilcoxon ORM_1 ORM_2 KW

Wilcoxon 1.0000000 0.9996366 0.9997652 0.9999543

ORM_1 0.9996366 1.0000000 0.9999797 0.9994056

ORM_2 0.9997652 0.9999797 1.0000000 0.9995917

KW 0.9999543 0.9994056 0.9995917 1.0000000

Au contraire!

While the differences in p-values are small, they are not small enough for me to attribute them to floating point arithmetic.

What is going on? For instance, is the equivalence only asymptotic?

Dave
  • 62,186

1 Answers1

2

Some of this is just using the correct arguments to the tests

> wilcox.test(a, b,exact=FALSE,correct=FALSE)$p.value
[1] 0.0001737215
> kruskal.test(y, x)$p.value
[1] 0.0001737215
> kruskal.test(y, x)$p.value-wilcox.test(a, b,exact=FALSE,correct=FALSE)$p.value
[1] 1.653408e-18

By default, wilcox.test uses the exact null distribution and kruskal.test uses a Normal approximation to the distribution. That is, the test statistics are equivalent, but the functions are using different reference distributions for them.

For the proportional odds, the score test is exactly the Wilcoxon rank-sum test, but again there will be computational details; it is presumably using a different estimator of the variance of the test statistic.

So, the tests are identical (in finite samples) in a theoretical sense. If they were all compared to their exact null sampling distributions the p-values would be the same. But the exact null sampling distribution is intractable for proportional odds models in general, and probably a bit of a pain for the Kruskal-Wallis test, so the implementations use approximations to the null sampling distribution that are asymptotically equivalent.

Thomas Lumley
  • 38,062
  • "Score" test meaning as opposed to something like Wald or likelihood ratio test? – Dave May 20 '21 at 21:06
  • 1
    Yes. The score test statistic is the derivative of the log likelihood with respect to the parameters, evaluated at the null hypothesis. – Thomas Lumley May 20 '21 at 23:06
  • Is there a reason for doing the score test instead of Wald or likelihood ratio? It has always seemed that, of the big three tests in statistical theory class, score got forgotten in favor of Wald and likelihood ratio testing? (I might be posting this as a new question.) – Dave Aug 27 '21 at 14:27
  • 1
    In this case, the fact that you only need to fit the null model is a real advantage. In eg logistic regression model building it actually isn't -- writing reasonably general code for score tests is surprisingly annoying compared to LRT and Wald -- but here the score test is way easier to compute. (One of my jobs for this weekend is debugging a student's score-test code for glms under complex sampling) – Thomas Lumley Aug 28 '21 at 00:12