3

Let $\vec{x}$ be 100 iid random varibles such that $x_i \sim \mathcal {N}(0,1)$

Let $\vec{y}$ be 100 iid random varibles such that $y_i \sim \mathcal {N}(1,1)$

For this example, the signed-rank test gives the p-value $\approx 10^{-8}$

I would like to construct a naive test I call a binomial test. Let $z = (x < y)$. I use the binomial distribution to estimate the probability that $z=True$ and end up with $\hat{p} \approx 0.25$ for the above example.

My questions is: why is my test so much worse than the signed-rank test? As far as I know, the latter does not make any use of the exact magnitudes of the differences, only of their ranks. I do not need a very rigorous proof, just an intuition on how it achieves such high significance.

  • I think you are not doing a binomial test. In your example, z should be about 75 observations out of 100. If you conduct a binomial test in this case, you should get a *p value around 10 ^ -07, but in any case almost always < 0.01, depending on the original random sample. The binomial test is comparing the count proportion (either about 0.25 or about 0.75) to a null proportion of 0.50. – Sal Mangiafico Jan 21 '20 at 17:31
  • But as noted in the answer, the analogous test based only a count of the signs of the differences would be the two-sample sign test. For this, too, for your example, the p value is typically 10 ^ -05, depending on the original random sample. – Sal Mangiafico Jan 21 '20 at 18:29

1 Answers1

1

Signed rank test is in this situation better, well, because it takes into an account signed ranks, while your binomial test only takes yes/no answers. Signed rank test uses more information, specifically it can take the magnitude of the difference into an account, although only in the form of ranks, while your binomial can't.

Sign test is your naive bionomial that takes only yes/no into an account and thus should give you the same results. However signed rank test and sign test are two different tests.

Edit Phrasing

rep_ho
  • 7,589
  • 1
  • 27
  • 50
  • That is what I am puzzled about. One of the 3 following must be true: (1) Signed-rank uses more info than binomial (2) Signed-rank somehow uses same info more efficiently, and (3) They are exactly the same, but there is a bug in my test – Aleksejs Fomins Jan 21 '20 at 13:05
  • I think you misunderstood. Signed rank test and sign test are two different tests. Sign test is your binomial test, signed rank test, uses more info (ranks), thus can get better results. – rep_ho Jan 21 '20 at 13:07
  • Ok, I think I almost get it. The only question I don't get yet is why are ranks informative. It is trivial why signs are informative for this problem, but for ranks it is more tricky. I will think about it and get back to you – Aleksejs Fomins Jan 21 '20 at 13:19
  • Be careful when you say that ranks take magnitude into account, it is not exactly true. If X are uniform random numbers between 0 and 1, and Y are uniform random numbers between 2 and 3, the signed-rank test always returns the same answer, regardless of the actual values of X and Y (I've checked). So it is not quite true that it uses magnitude, it is a bit more subtle than that – Aleksejs Fomins Jan 21 '20 at 13:21
  • that's what i meant by a magnitude of the difference in the form of ranks. If you have one pair 1, 1.1 and another pair 0, 1. For sign test they are both the same, but for signed rank test pair 2 has the bigger difference – rep_ho Jan 21 '20 at 13:26
  • Ok, I finally get it. Ranks are informative, and it is indeed a bit non-trivial. Wilcoxon test statistic multiplies signs by ranks. While ranks do not explicitly carry information about magnitude, they are sorted with respect to magnitude. So, getting a wrong sign for a high rank is more unlikely than getting a wrong sign for a low rank, because high rank variables are further apart than low-rank variables, even if we do not explicitly know by how much. It is actually kind of shocking how such a small tweak gains so much in statistical power – Aleksejs Fomins Jan 21 '20 at 13:43
  • Exactly. Another issue is that binomial test does not takes ties into an account (I think), it is as if you just throw away data with ties, however you won't get ties with signed rank. And of course you want even more information, you just test the magnitude of differences directly using paired t-test – rep_ho Jan 21 '20 at 14:02
  • As far as I know, ties are actually bad for signed-rank test due to its internal construction. t-test would of course be great, but the true distribution is not normal in my case – Aleksejs Fomins Jan 21 '20 at 15:23