Mann-Whitney (2 groups) contradicted by Kruskal-Wallis (3 groups)

Question

We have two groups that are significantly different (tested with Mann-Whitney). When a third group is added, the Kruskal-Wallis tests provides a non-significant result.

This implies that two or more groups were not significantly different from each other; hence the non-significant result.

or

Does this imply that we do not have enough power to detect any effect (type-1 error) and the third group is likely not different from the other two groups? So the result is inconclusive.

Let's flag the elementary but fundamental principle: Plot the data to see what is going on. Further: why not show us the data? — Nick Cox, Apr 06 '19 at 08:15
There are multiple possible explanations; we can't guess which one(s) will be the case for you. (Most of the possible explanations are explored in answers to other questions already on site) — Glen_b, Apr 06 '19 at 08:29
You are using the incorrect post hoc test. Mann-Whitney (a) does not use the same rankings as the K-S, so no surprise you are getting strange results, and (b) does not use the pooled variance assumed by the K-S null. Try Dunn's test or, even better, the Conover-Iman test. — Alexis, Apr 15 '23 at 20:10

score 2 · Accepted Answer · edited Apr 06 '19 at 08:16

One possibility is that the third sample introduces extra variability or violates assumptions for the Kruskal-Wallis test, so that it can find no differences among the three samples. Here is an example (using R):

# generate & display fake data
set.seed(1234)  # for reproducibility 
x1 = runif(5, 1, 4);  x2 = rnorm(5, 1, 1);  x3 = rnorm(5, 1, 5)
x123 = c(x1, x2, x3);  g123 = rep(1:3, each=5)
x12 = c(x1, x2);  g12 = rep(1:2, each=5)
stripchart(x123 ~ g123, pch="|", ylim=c(.5,3.5))

The two-sample Mann-Whitney-Wilcoxon test and the Kruskal-Wallis test both find a significant difference between x1 and x2at the 5% level, with P-values about 3%:

wilcox.test(x12 ~ g12)

        Wilcoxon rank sum test

data:  x12 by g12
W = 23, p-value = 0.03175
alternative hypothesis: true location shift is not equal to 0


kruskal.test(x12 ~ g12)

        Kruskal-Wallis rank sum test

data:  x12 by g12
Kruskal-Wallis chi-squared = 4.8109, df = 1, p-value = 0.02828

However, the Kruskal-Wallis test does not find any significant differences among x1, x2, and x3 at the 5% level:

kruskal.test(x123 ~ g123)

        Kruskal-Wallis rank sum test

data:  x123 by g123
Kruskal-Wallis chi-squared = 5.58, df = 2, p-value = 0.06142

Mann-Whitney (2 groups) contradicted by Kruskal-Wallis (3 groups)

1 Answers1

Linked