14

I am doing my dissertation, and I am conducting a number of tests. After using a Kruskal–Wallis test, I usually report the result like this:

There is a significant difference $(\chi^2_{(2)}=7.448, p=.024)$ between the means of...

But now I conducted a Mann–Whitney test, and I am not sure which values to present. SPSS gives me a Mann–Whitney $U$, Wilcoxon $W$, $Z$ and $P$-value. Do I present all these 4 values? Or are some irrelevant?

Nick Stauner
  • 12,342
  • 5
  • 52
  • 110
dissertationhelp
  • 561
  • 4
  • 6
  • 20

1 Answers1

13

Wikipedia appears to have your answers. Here's an excerpt from the example statement of results:

In reporting the results of a Mann–Whitney test, it is important to state:

  • A measure of the central tendencies of the two groups (means or medians; since the Mann–Whitney is an ordinal test, medians are usually recommended)
  • The value of U
  • The sample sizes
  • The significance level.

In practice some of this information may already have been supplied and common sense should be used in deciding whether to repeat it. A typical report might run,

"Median latencies in groups E and C were 153 and 247 ms; the distributions in the two groups differed significantly (Mann–Whitney U = 10.5, n1 = n2 = 8, P < 0.05 two-tailed)."

The Wilcoxon signed-rank test is appropriate for paired samples, whereas the Mann–Whitney test assumes independent samples. However, according to Field (2000), the Wilcoxon $W$ in your SPSS output is "a different version of this statistic, which can be converted into a Z score and can, therefore, be compared against critical values of the normal distribution." That explains your $z$ score too then!

FYI, Wikipedia adds that, for large samples, $U$ is approximately normally distributed. Given all these values, one can also calculate the effect size $η^2$, which in the case of Wikipedia's example is 0.319 (a calculator is implemented in section 11 here). However, this transformation of the test statistic depends on the approximate normality of $U$, so it might be inaccurate with ns = 8 (Fritz et al., 2012).

P.S. The Kruskal–Wallis test's results should not be interpreted as revealing differences between means except under special circumstances. See @Glen_b's answer to another question, "Difference Between ANOVA and Kruskal-Wallis test" for details.

References

Field, A. (2000). 3.1. Mann-Whitney test. Research Methods 1: SPSS for Windows part 3: Nonparametric tests. Retrieved from http://www.statisticshell.com/docs/nonparametric.pdf.
Fritz, C. O., Morris, P. E., & Richler, J. J. (2012). Effect size estimates: current use, calculations, and interpretation. Journal of Experimental Psychology: General, 141(1), 2–18. PDF available via ResearchGate.

Nick Stauner
  • 12,342
  • 5
  • 52
  • 110
  • 3
    What is the point of reporting the value of U in the example above? What do I, as reader, gain from knowing that U was 10.5? – amoeba Feb 21 '14 at 10:59
  • 4
    In the example above, you gain the ability to calculate the exact $p$, which is not given and may be useful for effect size estimation, meta-analysis, or checking for $p$-hacking. A friend and colleague of mine @rpierce has also advised me to report test statistics to ensure readers that I'm doing things properly in general, as he's caught many published articles doing it wrong via misreported test statistics and associated $df$. – Nick Stauner Feb 21 '14 at 11:15
  • Interesting. I guess this issue might be worthy of a separate question, which I might ask here at some point. Still: if one wants exact p-values, then one can report exact p-values! In fact, the usual advice is to report exact p-values, unless they are very small, like p<0.0001; but in this case p-hacking is unlikely. And effect size should be reported separately anyway, like e.g. "median latencies in groups E and C were 153 and 247 ms" in your quote from wiki. – amoeba Feb 21 '14 at 11:21
  • I fully agree that effect sizes should be reported, and generally would've agreed with the usual advice on exact $p$ values...but @rpierce has argued that this encourages readers to misinterpret $p$ values in all the myriad ways they do, and to use $p$ as a proxy for effect size instead of A) demanding the real deal or even B) using it when it's there. Some of that separate question has been discussed in my answer here and our comments, but seems far from settled...Regardless, his point about error-checking persuades me to report my test stats. – Nick Stauner Feb 21 '14 at 11:33
  • Hmmm. Is his point about error-checking expressed somewhere online or in print? I am curios about examples of articles that he caught. – amoeba Feb 21 '14 at 11:45
  • 1
    Secret (i.e., non-searchable) Facebook group for our alma mater's psych grads; no examples included. I'm curious too though. Maybe you can get him to share some if you tag him in a separate question :) I bet others here would have plenty of examples to chime in with too – that is, if the question doesn't get closed down as somehow off-topic! Certainly a more basic question like, "Why is it important to report test statistics in addition to $p$ and effect size?" would be on-topic though, I think...Check for duplicates under the [tag:reporting] tag first though, if you really want to be safe... – Nick Stauner Feb 21 '14 at 11:47
  • 1
    Where the sample sizes are not so small that it will be misinterpreted, I'd lean toward reporting a standardized U or W (standardized, they're identical) as a Z-value ($Z_U$ say), because readers will have an intuitive sense of what that means -- though then it becomes necessary to be clear when you have the exact p-value if you do, not one based off the Z score for the statistic. – Glen_b Feb 21 '14 at 13:47
  • @Glen_b: but doesn't p-value already provide readers with an intuitive sense, so much so that the value of statistic itself becomes irrelevant? Especially when the statistic is such that only a vary rare reader will have an intuitive feeling about its values (e.g. U). – amoeba Feb 21 '14 at 13:58
  • In which case, why ever report any test statistic but a p-value? For some people a p-value is fine - but I've found that for many, giving something interpretable as a Z or a t, even if only approximately, conveys a better understanding. – Glen_b Feb 21 '14 at 14:01
  • amoeba, be careful that you don't feed @rpierce further ammo for shooting down exact $p$ values by appealing to the accuracy of stats consumers' intuitive senses about $p$ values ;) That's an uphill battle! – Nick Stauner Feb 21 '14 at 14:02
  • Thank you for all your help. I decided to leave all the values in the tables but to include only (U=123, p=.001) in my discussion. – dissertationhelp Feb 21 '14 at 16:29
  • Another helpful clarification raised in a new comment by Glen_b... – Nick Stauner May 14 '14 at 23:33
  • @amoeba: I found another point of reporting $U$ (and the ns) in large samples: effect size calculation! – Nick Stauner May 17 '18 at 19:35