Central Limit Theorem for Wilcoxon signed rank tests

Question

Let $X_i, i=1,2,...,n$ be a set of iid observations, assumed symmetric about $\mu$. Let $R_i$ be the rank of the absolute deviations from some $\mu_0$, i.e. $R_i=\text{rank}(|X_i-\mu_0|)$. Let $Z_i=\text{sign}(X_i-\mu_0)$. Under the null hypothesis $\mu_0=\mu$.

If I pick up only one sample each time, let X be the random variable, the expected average value E(X) is 0.5, isn't it? the variance of X is 0, isn't it? so how how can I apply CLT for signed rank tests?

Considering the CLT in this context is a complete waste of time IMHO, because we have exact results for any sample size. And the ranks are not independent so usual theory doesn't apply. — Frank Harrell, Jan 25 '16 at 04:14
@FrankHarrell that's what I thought before, ranks are not independent, so how can I use CLT? but the textbook is using CLT, so I have to assume I'm wrong — whoisit, Jan 25 '16 at 04:20
@Glen_b It's an online course. This is the link https://onlinecourses.science.psu.edu/stat414/print/book/export/html/232 maybe it is better for you to understand my questions. My question is I can't apply CLT for rank test if I just take one sample each time. If it is a singlr Bernoulli variables, I can apply CLT for it — whoisit, Jan 25 '16 at 04:22
There certainly is a version of the CLT for the signed rank test; however, as I mentioned before, it's perhaps slightly more involved than you expect. The page you link to details the calculation of the expectation and variance of $\sum Z_iR_i$, but (as I expected) completely glosses over establishing asymptotic normality. — Glen_b, Jan 25 '16 at 04:37
Please add the self-study tag and read its tag wiki, modifying your question to more clearly show your reasoning. I have added a few sentences at the start to do the definition of terms I requested. This is still not sufficient. I suggest you remove all the images and explain in words and algebra what you're doing. — Glen_b, Jan 25 '16 at 04:45
@Glen_b, I have rephrased my question, hope it make sense this time — whoisit, Jan 25 '16 at 05:01
@Frank there are results showing asymptotic normality of the signed rank statistic but they don't assume that the $Z_iR_i$ are simultaneously independent and identically distributed. — Glen_b, Jan 25 '16 at 05:17
@Glen_b could you plz show me $E(Z_{i}R_{i}), \sigma(Z_{i}R_{i})$ — whoisit, Jan 25 '16 at 05:35
@Glen_b If I can get those two values, then I can apply CLT by $\frac{\frac{\sum Z{i}R_{i}}{n}-E(Z{i}R_{i})}{\frac{\sigma(Z_{i}R_{i})}{n}}$ — whoisit, Jan 25 '16 at 05:46
No, it looks to me like that's pretty much the entire point of the exercise -- there would be essentially nothing left for you to actually do. I believe I have outlined an easy approach in my answer (one I presume they're after). Are you really saying that you can happily work out $E(Z_i)$ but cannot find $E(2Z_i)$? Note also that it's not sufficient to say the equivalent of "I have a mean of random variables and I've subtracted the expectation and divided by the standard error, so yo CLT, done". Which theorem are you applying and are its conditions satisfied? — Glen_b, Jan 25 '16 at 07:25
@Glen_b Looks like I have been fundamentally wrong in CLT. As you mentioned I was doing like "I have a mean of random variables and I've subtracted the expectation and divided by the standard error, so yo CLT, done", but that's exactly how bernoulli varaible works in this way. The expectation of a bernoulli varaible is P, and the variance is P(1-P), then sum up n variables then divided by n, then subtracted the expectation and divided by the (standard error divided by n^1/2), which is $\frac{\overline{X}-p}{\sqrt{\frac{p(1-p)}{n}}}$ — whoisit, Jan 25 '16 at 08:39
Yes, but you satisfy conditions under which you could do that for the Bernoulli case (presumably you used the "classic" CLT -- where you have a CLT for standardized means of random variables that are independent, identically distributed, with finite variance ...). If you don't have iid (it doesn't look like it here), and you can't see a clever way to somehow make an iid CLT work, then you'd have to rely on a different CLT (as I pointed out in my answer; there are a number you might choose from). If you do rely on a different CLT, you have to make sure the conditions for the one you use apply. — Glen_b, Jan 25 '16 at 10:02
It looks to me like demonstrating the condition for Lyapunov is quite straightforward. In this case, you might like to try the $a_i=i$ version (i.e. just dealing with ranks, not more complicated functions of them). — Glen_b, Jan 25 '16 at 10:02

Glen_b · Answer 1 · 2016-01-25T05:28:26.467

I suppose that perhaps what you're expected to do is treat the set of ranks $\{R_i\}$ as fixed and the $Z_i$ as independent Bernoulli. Which is to say you have a set of scaled Bernoulli variates.

(e.g. one could relabel so that $R_i=i$ without changing the distribution of the sum $\sum_i R_iZ_i$. But rather than deal with $\sum_i iZ_i$ let's generalize a little.)

Forget about them being ranks for the moment and imagine instead you have a set of constants, $a_i$ (which obey certain conditions relating to how big they get as $n$ increases so that we can apply the limit theorems we need later) and a set of independent Bernoulli$(\frac12)$ variates and you want the mean and variance of $\sum_i a_iZ_i$.

To start with individual terms:

$\text{E}(a_iZ_i)$
$\text{Var}(a_iZ_i)$

These are easy to calculate!

You can then progress to the mean and variance of the sum. Of course for the variance of the sum you also need to worry about covariance terms. If you do that right you get exactly the mean and variance you need.

Now if we treat our $a_i$ values as constant, the $a_iZ_i$ are independent but not identically distributed -- but there are certainly versions of the CLT for non identically distributed variates e.g. 1, e.g.2. If you check their conditions you might be able to apply one of them.

(Alternatively, if we were to regard the $a_i$ values as randomly selected without replacement from the set of ranks, then we have identically distributed $a_iZ_i$ but they're no longer independent. There's also limit theorems which can deal with that dependence. Indeed there's extensive literature on the asymptotic distribution of rank-statistics)

score 2 · Answer 2 · edited Oct 18 '18 at 22:10

One way to show the asymptotic normality of the Wilcoxon signed rank statistic is through the use of Hoeffding's U-Statistics Theorem. For this statistic you create a kernel based on Walsh averages (these are the ($X_i$ + $X_j$)/2 averages). U-Statistics theory allows you to obtain asymptotics results even for dependent summands, and central to this result is the notion of projections which approximates the sum of the dependent variables by a sum of independent random variables and on which you could obtain the usual CLT. For details see, for instance, the book by Ronald Randles and Douglas Wolfe on Theory of Nonparametric Statistics.,

Taylor · Answer 3 · 2022-05-24T20:03:29.873

To add to the above answers (+1 to both), here's a good resource on deriving a CLT for a U-statistic when the ranks are assumed to be random (when they are nonrandom you can use Lyapunov's CLT--this is what @Glen_b is referring to). They mention the projections that @Michael R. Chernick is referring to. You can see it is involved.

Here's the thing I wanted to add: the signed-rank test isn't exactly a U-statistic, though--there's still a bit of a jump. The answer in this thread shows you can write the signed-rank test as a sum of "Walsh averages," however notice that the sum runs over indexes where $i \le j$. In a U-Statistic the sum runs over all $i < j$. Even though the signed-rank test has sum extra sums corresponding to where $i=j$, you can show that they still (after having been properly rescaled and recentered) have the same asymptotic normal distribution. This is exercise 1.7 in Chapter 6 of this book.

Central Limit Theorem for Wilcoxon signed rank tests

3 Answers3