I was playing a bit with simulations to get a better of picture of how unpaired t-test and Welch t-test compare when comparing same-mean data with uneven variance. As a third test, I included the rank-sum test. I noticed that when I repeatedly compare groups of 60 and 30 normally distributed samples (mean 0, first group having stdev of 1, the second stdev of 2), the rank-sum test has the tendency to give false positive significance, giving p<0.05 in 0.08-0.09 of the total number of simulation repeats (a similar trend to the non-Welch t-test). When the two groups have same size of 30 (with stdevs of 1 and 2 again), the p-value is just a shade over the expected 0.05.
Where does the high false negative rate of the rank-sum come from in the case with uneven group sizes? I think I understand what the null hypothesis is, but still don't immediately see how the group size affects the test.
Simulation code for just the rank-sum test in Matlab below:
nRepeats = 1e4;
%% We measure fraction of false positives for data with uneven variance, with:
% a) same-size group
% b) first group twice as large as the second one
meanVal = 0;
sd1 = 1;
sd2 = 2;
nSamples = 30;
pRSsame = zeros(nRepeats, 1);
pRSdifferent = zeros(nRepeats, 1);
for iRepeat = 1:nRepeats
% same-size
data1 = randn(nSamples, 1) * sd1 + meanVal;
data2 = randn(nSamples, 1) * sd2 + meanVal;
pRSsame(iRepeat) = ranksum(data1, data2);
% data1 larger
% same-size
data1 = randn(2*nSamples, 1) * sd1 + meanVal;
data2 = randn(nSamples, 1) * sd2 + meanVal;
pRSdifferent(iRepeat) = ranksum(data1, data2);
end
fractionFPsame = sum(pRSsame < 0.05)/nRepeats % ~0.055
fractionFPdifferent = sum(pRSdifferent < 0.05)/nRepeats %~0.088
Thank you!
I guess the difference might be in that with e.g. t-test, people just say "don't use it" if the variances are unequal, but with the Wilcoxon rank-sum, people use it even for data with very different distribution (which is not a problem, as you say, but it seems the interpretation is different)?
(also thank you for your responses - I removed the question probably while you were writing, sorry, because I realised it wasn't very clear with regards to what I meant).
– TJ27 Oct 09 '21 at 03:59With power calculations I find it can be the most straightforward to simulate that anyway, based on how the data actually look (if we have an idea) and the test I plan to use...
– TJ27 Oct 09 '21 at 04:13I think a part of my confusion about how to phrase this was in that I often see the test for comparing distributions that are clearly not same-shape (e.g. left-skewed versus right-skewed) on the grounds of non-normality, so I think I was subconsciously looking for phrasing that would not require same shape&scale, but would preserve the nominal level.
– TJ27 Oct 21 '21 at 17:04