3

The Wilcoxon Signed Rank Test is immensely intuitive and cool in its simplicity. So much so that any questions seem out of place. However,...

I have found different procedures being detailed regarding the actual computation of the test statistic. Naturally, I did go to the original paper (I believe) by none other than Frank Wilcoxon, where he details an experiment with seed treatment on wheat, corresponding to a randomized block experiment with eight replications of treatment A and B. The data are:

block <- c(1:8)
A <- c(209,200,177,169,159,169,187,198)
B <- c(151,168,147,164,166,163,176,188)
Diff <- A - B
sig.rnk <- sign(A - B) * rank(abs(A - B))
print(data.frame(block, A, B, Diff, sig.rnk), row.names = FALSE)

 block   A   B Diff sig.rnk
     1 209 151   58       8
     2 200 168   32       7
     3 177 147   30       6
     4 169 164    5       1
     5 159 166   -7      -3
     6 169 163    6       2
     7 187 176   11       5
     8 198 188   10       4

He goes on to state that "The sum of the negative rank numbers is $-3$. Table II shows that the total 3 indicates a probability between 0.024 and 0.055 that these treatments do not differ." So he is saying that the p value is significant, and that treatments A and B differ. The point, though, is that his TS (test statistic) is $-3$. Then he goes to the table, which begin thus:

I'm sure he wouldn't mind the expression: $W = \displaystyle \sum_{i=1}^{N_r}R_i^{-} = -3$,

$N_r$ being the reduced sample size after throwing out zero differences. $R_i$ are the particular ranks for each paired observation.

[R] seems to instead generate a $V = 33$ test statistic that makes sense, in the way of complementarity: if the sum of all ranks is in this case $n\,(n + 1)/2 = 8 * (8 + 1) / 2 = 36$, [R] opts for the sum of all positive ranks, or Total - Negative Ranks $= 36 - 3 =33$:

sum(sig.rnk[which(sig.rnk>0)]) [1] 33 # Compare this result to the built-in function:
wilcox.test(A, B, paired = T, correct = F)
Wilcoxon signed rank test

data:  A and B
V = 33, p-value = 0.03906
alternative hypothesis: true location shift is not equal to 0

So we would have: $W = \displaystyle \sum_{i=1}^{N_r}R_i^{+} = 33$.

Finally, there is a third procedure (I'm sure it's all the same...) in Wikipedia:

$W = \displaystyle \sum_{i=1}^{N_r}\,[\,sgn(x_{2,i}-x_{1,i})\cdot R_i]$, which in our case amounts to:

sum(sig.rnk)
[1] 30

So $W = 30$.

QUESTION: Why different procedures with three different $W$ values ($-3$, $33$ and $30$)? They all come down to the same idea, but can we quickly "see" that they are one and the same? And, any advantages of one versus the other?

1 Answers1

2

[For the sake of simpler discussion, assume the data are continuous (no tied ranks anywhere).]

All of the test statistics are equivalent in that they will lead to the same p-values. Indeed, the test statistics are closely related; some may be slightly more convenient in terms of the amount of hand calculation involved -- but in the era of computers it matters not in the least.

To avoid calling three different statistics "W", I'll label each statistic in a way that indicates what is summed.

Further, let the sum of the ranks (without consideration for sign) be $R$; $R=\frac{n(n+1)}{2}$.

Let the sum of the negative ranks be $R^-$ (which is negative) and the sum of the positive ranks be $R^+$. Clearly $R=R^+-R^-$; therefore

(i) the sum of the positive ranks $R^+=R+R^-=\frac{n(n+1)}{2} + R^-$.

(ii) the sum of the ranks (which I'll call $R^\pm$) has $R^\pm=R^++R^-= R+2R^- = \frac{n(n+1)}{2} + 2R^-$.

Therefore all three statistics are of the form $a+bR^-$ for some $a,b$ (with $b\neq 0$).

Consequently, these statistics order the possible samples in the same fashion (if one sample has a more extreme test statistic under one test it will be more extreme on another) so we will always place the same sample arrangements in a given sized critical region under each test, and so we will always have identical p-values.

Further, since they're linear functions of each other, the normal approximations for each (which only differ in location and scale in a now obvious way) will come in at exactly the same rate.

So it doesn't matter which you use and you can freely convert between them.

Glen_b
  • 282,281