Clarification on the Wilcoxon Signed Rank Test Procedure

Question

The Wilcoxon Signed Rank Test is immensely intuitive and cool in its simplicity. So much so that any questions seem out of place. However,...

I have found different procedures being detailed regarding the actual computation of the test statistic. Naturally, I did go to the original paper (I believe) by none other than Frank Wilcoxon, where he details an experiment with seed treatment on wheat, corresponding to a randomized block experiment with eight replications of treatment A and B. The data are:

block <- c(1:8)
A <- c(209,200,177,169,159,169,187,198)
B <- c(151,168,147,164,166,163,176,188)
Diff <- A - B
sig.rnk <- sign(A - B) * rank(abs(A - B))
print(data.frame(block, A, B, Diff, sig.rnk), row.names = FALSE)

 block   A   B Diff sig.rnk
     1 209 151   58       8
     2 200 168   32       7
     3 177 147   30       6
     4 169 164    5       1
     5 159 166   -7      -3
     6 169 163    6       2
     7 187 176   11       5
     8 198 188   10       4

He goes on to state that "The sum of the negative rank numbers is $-3$. Table II shows that the total 3 indicates a probability between 0.024 and 0.055 that these treatments do not differ." So he is saying that the p value is significant, and that treatments A and B differ. The point, though, is that his TS (test statistic) is $-3$. Then he goes to the table, which begin thus:

I'm sure he wouldn't mind the expression: $W = \displaystyle \sum_{i=1}^{N_r}R_i^{-} = -3$,

$N_r$ being the reduced sample size after throwing out zero differences. $R_i$ are the particular ranks for each paired observation.

[R] seems to instead generate a $V = 33$ test statistic that makes sense, in the way of complementarity: if the sum of all ranks is in this case $n\,(n + 1)/2 = 8 * (8 + 1) / 2 = 36$, [R] opts for the sum of all positive ranks, or Total - Negative Ranks $= 36 - 3 =33$:

sum(sig.rnk[which(sig.rnk>0)]) [1] 33 # Compare this result to the built-in function:
wilcox.test(A, B, paired = T, correct = F)
Wilcoxon signed rank test

data:  A and B
V = 33, p-value = 0.03906
alternative hypothesis: true location shift is not equal to 0

So we would have: $W = \displaystyle \sum_{i=1}^{N_r}R_i^{+} = 33$.

Finally, there is a third procedure (I'm sure it's all the same...) in Wikipedia:

$W = \displaystyle \sum_{i=1}^{N_r}\,[\,sgn(x_{2,i}-x_{1,i})\cdot R_i]$, which in our case amounts to:

sum(sig.rnk)
[1] 30

So $W = 30$.

QUESTION: Why different procedures with three different $W$ values ($-3$, $33$ and $30$)? They all come down to the same idea, but can we quickly "see" that they are one and the same? And, any advantages of one versus the other?

score 2 · Accepted Answer · answered Nov 12 '15 at 08:02

[For the sake of simpler discussion, assume the data are continuous (no tied ranks anywhere).]

All of the test statistics are equivalent in that they will lead to the same p-values. Indeed, the test statistics are closely related; some may be slightly more convenient in terms of the amount of hand calculation involved -- but in the era of computers it matters not in the least.

To avoid calling three different statistics "W", I'll label each statistic in a way that indicates what is summed.

Further, let the sum of the ranks (without consideration for sign) be $R$; $R=\frac{n(n+1)}{2}$.

Let the sum of the negative ranks be $R^-$ (which is negative) and the sum of the positive ranks be $R^+$. Clearly $R=R^+-R^-$; therefore

(i) the sum of the positive ranks $R^+=R+R^-=\frac{n(n+1)}{2} + R^-$.

(ii) the sum of the ranks (which I'll call $R^\pm$) has $R^\pm=R^++R^-= R+2R^- = \frac{n(n+1)}{2} + 2R^-$.

Therefore all three statistics are of the form $a+bR^-$ for some $a,b$ (with $b\neq 0$).

Consequently, these statistics order the possible samples in the same fashion (if one sample has a more extreme test statistic under one test it will be more extreme on another) so we will always place the same sample arrangements in a given sized critical region under each test, and so we will always have identical p-values.

Further, since they're linear functions of each other, the normal approximations for each (which only differ in location and scale in a now obvious way) will come in at exactly the same rate.

So it doesn't matter which you use and you can freely convert between them.

Clarification on the Wilcoxon Signed Rank Test Procedure

1 Answers1