The Wilcoxon-Mann-Whitney test requires that two distributions are symmetrical How can I check this assumption by using a hypothesis test and how apply it in R? If this assumption is not met what test should I use rather than the Wilcoxon-Mann-Whitney test?
-
This post has extensive discussion of the various assumptions of the MWU test and I think is relevant here. – COOLSerdash Jan 15 '23 at 09:53
1 Answers
The Wilcoxon-Mann-Whitney test requires that two distributions are symmetrical
No, it doesn't require symmetry of both distributions.
(What makes you think this is necessary?)
It requires exchangeability of the ranks under H0 (and not under H1); the most typical way to get that would be if the two distributions had the same shape when H0 is true. They don't have to have the same shape when its false.
How can I check this assumption by using a hypothesis test
Even if it were to be a necessary assumption, testing it on the samples is not especially relevant, since you'd only require it under H0, which you have no good reason to think is true; indeed with an equality null, it's almost certainly false, so the assumption applies under a situation which the data don't typically tell you much about.
Edit: Let me give an explicit example.
Illustrative example
Imagine for the sake of argument that I have two samples, one notionally a sort of 'control' group and one a sort of 'treatment' group. There's a claim that the treatment performed on the treatment group is highly valuable (has a substantive effect on the measure of interest) but we don't really think the treatment does anything at all.
Now imagine that each population has a beta distribution (under both H0 and H1). If the treatment does nothing, the parameters of the distribution would be the same (so that indeed the Wilcoxon-Mann-Whitney would have exchangeable ranks under H0).
If the treatment does something that corresponds to the claim we would expect the parameters to change.
For example, if the alternative is true we could see something like this:
while if the treatment did nothing at all, we would expect to see the black distribution for both populations.
If $H_1$ were true and we had samples from these two distributions, checking the assumption of the same shape under $H_0$ by looking at the data would be completely misleading $-$ in a reasonable-sized sample we would be highly likely to conclude that the shapes differ.
If we then conclude that there's a problem with the test, we are making a grave error. We're actually in a situation that the test is designed for!
- 282,281
-
I agree with the answer, however the H0 should at least be relevant for the situation in question; why would anybody want to test it otherwise? I'm not saying it should literally be true, but it should at least potentially fit the data well enough to not be rejected! It's about compatibility, not truth, and therefore I disagree with "the data don't typically tell you much about it" - it's the very point of the test that they do. – Christian Hennig Jan 12 '23 at 12:08
-
1Agreed, naturally. Another way to explain this is that WMW works with the ranks and there is absolutely no information in the ranks about symmetry or asymmetry. You can easily get the same ranks from quite different distribution shapes. – Nick Cox Jan 12 '23 at 12:22
-
@Glen_b I found this assumption when I researched about this test on google (If the test is used as a test of dominance, it has no distributional assumptions. If it used to compare medians, the two distributions must be identical apart from their locations. If it used to compare means, those two distributions must also be symmetrical.) and I want to use this test to compare means. – Statistical scientist Jan 12 '23 at 12:30
-
1@Nick Nevertheless there is information about asymmetry in the original data. Some investigation of that is worthwhile in any case: if the test leads you to reject the null hypothesis, consider whether the explanation might be asymmetry rather than a true difference in location; and when it does not lead you to reject the null, consider whether an asymmetry might be making the test less powerful than expected. – whuber Jan 12 '23 at 15:28
-
1@whuber Indeed, and as almost always we agree. Either way being aware of asymmetry and its consequences is essential too. – Nick Cox Jan 12 '23 at 15:35
-
Harold Jeffreys in his often difficult but always brilliant book Theory of Probability (utterly mistitled) has a key section on the dire influence of wishful thinking in statistical analysis. Wanting WMW to be a test of means can be a prime example of wishful thinking. What is the real scientific question here? Why not answer it quantitatively with say a bootstrapped confidence interval for the difference in means? – Nick Cox Jan 12 '23 at 15:38
-
@ChristianHennig Hi Christian, thanks for your comment. I have added an illustrative example of the point I am making at the end to clarify the situation for H0 and a potential one for H1 and more discussion of why looking at the data may then not be relevant for the assumptions under H0. I suspect we don't disagree about much that's substantive but if you have a problem with what's there, please let me know. – Glen_b Jan 13 '23 at 04:12
-
@Statisticalscientist I fear I disagree with several parts of what you found. See, for example the illustration I added at the end for my response to Christian. That example comes from a sequence of alternatives that have a difference in mean (e.g. under the null, we have a beta($\alpha_0,\beta$) and and under the alternative we have beta($\alpha_i,\beta$) for some $\alpha_i>\alpha_0$; indeed, consider ... – Glen_b Jan 13 '23 at 04:29
-
... a sequence of possible $\alpha$s, $\alpha_0<\alpha_1<\alpha_2<....$. each alternative in the sequences has a larger mean than the ones before it (with lower index). The shapes are all different from the others, and only one of them is symmetric. Clearly, then, the test is sensitive to changes in mean in cases where you don't necessarily have symmetry nor identical shapes in the population distributions the samples were drawn from. – Glen_b Jan 13 '23 at 04:29
-
@Glen_b Fine addition, I like it. I have learnt as alternative of the $P=Q$ null hypothesis $P\prec Q$ (stochastic ordering), which holds in your example, and as you correctly say, doesn't require equality of distributional shapes. Harder situations to discuss are those where $P\neq Q$, but where none of these is stochastically larger than the other. It is not always clear in these cases whether the test should reject $H_0$ or not, and I still think it's valuable to diagnose differences in distributional shape to know what's going on. – Christian Hennig Jan 13 '23 at 11:21
-
