If there are two samples from a distribution whose form is not known, and which is certainly not normal, how can one ascertain the probability that, by random chance, their means/medians are dissimilar? In other words, is there a t-test equivalent for samples coming from a distribution of unknown form?
-
There are non-parametric tests, but they have a very different null hypothesis. – Roman Luštrik Apr 01 '13 at 21:57
4 Answers
There are tests of location difference that don't assume normality or which make other assumptions.
What exactly they test depends on which additional assumptions you make.
For example, in place of a two-sample t-test, there are classical nonparametric equivalents like the Wilcoxon-Mann-Whitney, robustified versions of t-tests (e.g. ones based on trimmed means and Winsorized variances) and there are resampling procedures (randomization and bootstrapping tests, for example).
With the WMW, what exactly is being tested depends on your additional assumptions; if you assume a location-shift alternative (that is, you assume that the only way for the null to be wrong is for the location of one groups to be shifted relative to the other), then given that assumption, the test is a test of difference in means and also of medians and also of lower quartiles and ... . However, it is sensitive to other kinds of difference, and the alternative can be framed as generally as $P(X>Y) \neq \frac{1}{2}$ (e.g. see Conover's Practical Nonparametric Statistics for that form).
A robustified test appears to be testing something different from a mean shift, but again, if under the alternative you have an 'identical distributions apart from location shift' assumption (hence identical in all respects under the null), it is also a test for difference in means.
Randomization and bootstrap tests can be more explicitly constructed to use a mean, of course.
If you don't assume identical distributions under the null, things become more complex; the WMW can still carry meaning, for example, but understanding what you're doing becomes more complex.
- 282,281
Yes.
If the sample size is large enough you can rely on the central limit theorem and each of the sample means will be approximately normally distributed. The central limit theorem shows that, whatever the distribution of the underlying population, the sample mean has close to a normal distribution if the sample size is large enough. This leads to effectively the same test as the t-test (you use a normal distribution instead of t distribution for the test statistics but once sample size is over about 30 they are basically the same) and there are explanations in any basic stats text.
In the past century or so it has become apparent that "large enough" as used by me in the previous para is probably quite a bit larger than often implicitly assumed (ie 30 is enough for a t distribution to look like a normal distribution, but not necessarily for a sample mean from a radically non-normal distribution to get "close" to normal). The mean is a particularly hard thing to estimate for populations with real world distributions - even a mixture of two normal distributions is enough to make a sample size much larger than 30 necessary for the central limit theorem to bring the sample mean to "close" to normally distributed.
If you sample size is smaller than several hundred you have a number of tools to use. Basing inference on the trimmed mean rather than the mean can be very helpful, as can bootstrapping. These should be combined and give a good robust test.
If a test based on bootstrapped trimmed means is problematic for some reason you are left with non-parametric tests.
- 17,650
A possible non-asymptotic solution consists of using empirical likelihood ratio test. However, the power of this test may not be appealing when compared with appropriate (ad hoc) tests. See http://interstat.statjournals.net/YEAR/2011/articles/1107001.pdf
- 11
See a list of non-parametric methods here. http://en.wikipedia.org/wiki/Non-parametric_statistics
- 3,723