How can we determine the sample size needed (aka do Power Analysis) for non-parametric approaches like the Mann Whitney U test or the Kolmogorov Smirnov Test?
2 Answers
While nonparametric tests typically have the same type I error across a very wide class of distributions (in many cases, for all continuous distributions), they don't have the same power characteristics across all distributions.
So while the basic idea is the same -- you specify a particular alternative at which you want a particular amount of power and then you work out the sample size that will give you that rejection rate for that alternative -- to be able to compute the power, you need to specify precisely what the situation is.
If you do specify it precisely, then you don't necessarily have to be able to do the calculation algebraically (though sometimes it should be doable); simulation is generally sufficient.
At the bottom of this answer I compare power curves for Shapiro-Wilk and Lilliefors tests for normality against a sequence of (increasingly skew) gamma distributions. The normality being tested for in both tests leaves the mean and variance of the hypothesized normal unspecified, but in the case of the Kolmogorov-Smirnov, you'd specify those as well. Otherwise the calculations are the same.
In a similar vein, in this answer I compare power for a one-sample t-test and a Wilcoxon signed-rank test. In that case, normality was specified.
In both of those answers, a fixed sample size was chosen and some parameter under the collection of alternatives considered was varied. For a sample size calculation, you'd fix the alternative completely and vary the sample size until the desired power was identified.
[If you don't wish to specify a distribution quite that specifically -- by restricting it to some broader class of distributions instead, say -- you would have to compute the lowest power across all distributions in the class. In many cases this might be difficult.]
-
Glen_b, bit of an off-topic comment, I appreciate your comprehensive posts on power analysis. Is there any book specifically on power analysis that you might like to recommend to someone interested? Meanwhile I would dig in old CV posts if there were any reference request thread on this topic. Thanks. – User1865345 Sep 09 '23 at 08:26
-
1There's no book specifically on power that I know of that I'd suggest for learning from. Simulation across a sequence of specific alternatives is usually fairly straightforward; I give general guidance or some simple R code in answers in a few places (though there's a collection of additional neat things that can be done to make it faster, easier or better). There's nothing particular to R in the concepts involved. – Glen_b Sep 09 '23 at 17:26
-
Thanks, Glen_b, for the response! – User1865345 Sep 09 '23 at 17:38
-
1If you have specific questions, of course, you can post them. – Glen_b Sep 09 '23 at 17:41
-
Sure, Glen_b, I was rather infatuated with having a source listing the techniques formalizing in the style you advocate. In the long run, I would be happy to see making a CW post here listing all the approaches that are mentioned in your posts and in some others (maybe providing link/briefly stating), as these type of questions are common in CV. – User1865345 Sep 09 '23 at 17:48
-
1I don't presently see a good way to produce a question that was not overly broad for the site, nor an answer that would be of a reasonable size. I'll have a think about what might be doable. – Glen_b Sep 09 '23 at 17:58
-
Appreciate your contribution and feedback as always, Glen_b. I won't waste your time further elongating the conversation but this was my intention as these approaches are very handy, intuitive and useful and to see them in a single place will be quite exciting/helpful. – User1865345 Sep 09 '23 at 18:01
To compute the power of the Wilcoxon-Mann-Whitney test or the Kruskal-Wallis test, use their generalization: the proportional odds semiparametric ordinal logistic model. For the two-sample problem (Wilcoxon with no covariate adjustement) Whitehead provided a formula for the variance of the log odds ratio in the ordinal model, and that variance forms the basis of power/sample size calculations. The beauty of this approach is that it works very well if there is an extreme number of ties, floor effects, ceiling effects, bimodality, general non-normality, etc. Illustrative Wilcoxon power calculations for discrete and non-normal variates are given here.
- 91,879
- 6
- 178
- 397