What are the reason/s to use a nonparametric normality test (e.gr., Shapiro-Wilk, Jarque-Bera) instead of generic, parametric goodness-of-fit tests (good for any distribution including but not limited to the normal, with parameters, like $\chi^2$ or Kolmogorov-Smirnov) for some data we want to check for normality?
-
1What distinction are you making by calling something a "goodness-of-fit test to a normal distribution" instead of a "normality test"? What are some examples of such tests you might have in mind? – whuber May 24 '22 at 19:55
-
@whuber, either a chi-squared or a Kolmogorov Smirnov test, for example. Suppose you want to make sure your data isn't too far from normality in order to perform a $\chi^2$ test for variance (which requires normality). Is there an advantage about Sh-W over K-S? – Rafael May 24 '22 at 20:02
-
1Those are all Normality tests. – whuber May 24 '22 at 20:04
-
@whuber then I must have a misconception. A goodness-of-fit test can be used to test fitness to a normal distribution, of course, but there are tests that are not goodness-of-fit that are intended to test for normality. Should I ask gof vs non-gof tests for normality instead? – Rafael May 24 '22 at 20:13
-
1I honestly can't figure out what you mean by "goodness of fit test." All the ones you have named so far are manifestly tests of normality. – whuber May 24 '22 at 20:15
-
1@whuber I believe you, but I'm puzzled. https://en.wikipedia.org/w/index.php?title=Goodness_of_fit_test. Some of these tests work for any distribution and require that the parameters are included as part of the null hypothesis, some only work for the normal distribution and don't require parameters (maybe I'm wrong about this?) – Rafael May 24 '22 at 20:32
-
4I don't think you're wrong. The distinction you seem to be making is between tests of specific distributions and tests of distribution families, and that's well worth making. But that distinction is neither apparent in your question nor is it appropriate for the examples you give, since what is usually meant by "checking data for normality" and "goodness of fit ... to a normal distribution" are all ordinarily understood to be testing whether data come from a normal family. – whuber May 24 '22 at 21:13
-
@whuber thanks for the clarification. I've learned this informally, so to say. And it seemed to me that the term normality test is used by some to mean specifically the ones specialized in the normal family (see e.g., this table in Wolfram Mathematica documentation). – Rafael May 26 '22 at 17:16
-
I'm willing to believe that. But since it's apparent that both "normality test" and "goodness of fit" test are somewhat vague, general terms of art, anyone choosing to use either phrase in an unambiguous way will want to explain what they mean if they care about being understood correctly. I'm not saying you must do that: vagueness has its place, but it's best to be aware of when it's likely some of your audience will have alternative interpretations. – whuber May 26 '22 at 17:38
-
@whuber, thank you. I hope the question is clearer after my latest edits? – Rafael May 26 '22 at 17:44
2 Answers
First, it's worth noting that testing for normality is a basically useless activity (cf., Is normality testing 'essentially useless'?). No dataset in the real world is normally distributed, so we already know the null hypothesis behind these tests is false. What's left is that the test can correctly reject the null, if the sample size is large enough relative to the way the data deviate from true normality, or can yield a type II error, if the dataset is relatively smaller. However, what really matters isn't how many data you have, but the size and nature of the deviation from normality, which tests can't tell you.
That having been said, the reason specialized tests like the Shapiro-Wilk are used instead of generic goodness of fit tests, is because we primarily care about some specific types of deviations from normality. Data can deviate from normality in potentially innumerable ways. For simplicity, you can imagine a distribution that has the same kurtosis (fat-tailed-ness) as a normal, but differs in being skewed, or a distribution that differs in kurtosis, but is perfectly symmetrical. If you tested one of those parameters, you would miss the other. Of course, a general test will in some sense cover everything, but not with equal power—it will be more sensitive to some deviations than others. Which deviation will be most detectable will differ by test. Thus, you might as well use the test that is maximally sensitive to the deviations you care about. Those are deviations in the tails, and the Shapiro-Wilk is weighted to preferentially detect them.
- 145,122
-
1+1 for the latter stuff, but I am always astonished that so many people want to make general statements about "uselessness" (or usefulness) of preprocessing approaches such as normality testing without making the proviso that it ultimately depends on the purpose. As long as for any given data set size I cannot predict whether normality will be rejected or not, such a test is informative, and then one has to discuss whether the information it gives is useful, which depends on what exactly you'd want to do with the result. – Christian Hennig May 24 '22 at 20:44
-
2I should add that I agree that no real data are really normal, but the question whether the data "look like generated from a normal" (even though we know it isn't) can still be of interest. All our models are thought constructs and tools, and not meant to be "true", but in certain situations we may want to drop them if the data clearly contradict them. – Christian Hennig May 24 '22 at 20:45
-
1The way people who attempt to apply statistics to some application area most often tend to use normality tests (as a way to decide what test - indeed what specific hypothesis - to use on some data, by testing that same data), they're not just mostly useless, but pretty actively problematic. From what I have seen, this appears to account for an overwhelming majority of times that normality tests are used. – Glen_b May 24 '22 at 22:45
-
1@Glen_b While I agree on it being "problematic", I'm not as generally negative as some others about it, see here: https://arxiv.org/abs/1908.02218 – Christian Hennig May 24 '22 at 23:22
-
-
@ChristianHennig, yes, I do hold that normality tests are useless. We know in advance that the null is false. The test gives you no information beyond whether it made a correct rejection or a type II error. OTOH, I'm a big fan of assessing normality (eg, looking at qq-plots, etc) to determine the nature & extent of any deviations from normality. That information may well be useful, depending on the purpose, as you say. – gung - Reinstate Monica May 25 '22 at 00:38
-
@gung-ReinstateMonica Are you saying all tests are useless because no model is ever true (and therefore no null hypothesis in particular)? Or is there something particularly useless about normality testing in your view? – Christian Hennig May 25 '22 at 10:59
-
@ChristianHennig, I don't believe all null hypotheses are true. I don't believe anything like that. In a true experiment, the null hypothesis for an intervention could well be true, & testing it is quite reasonable. With observational data, believing the null is silly, but a test could tell you whether you can be confident the marginal association is positive / negative. In general, though, I think hypothesis testing is overhyped & conducted mindlessly. In particular, I think it is dumb to conduct a hypothesis test when you know the answer with absolute certainty. – gung - Reinstate Monica May 25 '22 at 11:08
-
2@gung-ReinstateMonica (a) I agree that null hypothesis testing is often misused and misinterpreted. (b) In my view no model can be "true" or should ever be "believed" (that's just the wrong category for models) and null hypothesis tests are not about "believing" one model or the other, they are about compatibility of data with model. As there is no way to get any better relation between model and data than compatibility, tests have their sense. (I'm not saying compatibility as measured by tests is always the right thing to ask for. But I find a general negative statement is not appropriate.) – Christian Hennig May 25 '22 at 11:21
-
@ChristianHennig, I'm pretty reluctant to continue debates in comments, so I'll stop here. Of all the silly hypothesis tests that are done, in my sincere opinion, tests of normality are the worst. I can't remember the last time I did one (I'm sure I did in stats 101 years ago) & I have great difficulty imagining a case where I would. It is entirely possible to assess the compatibility between data & the normal by computing descriptives (eg, skew & kurtosis), & looking at plots (eg, qqplots). Look at the n5000 qqplot in the linked thread & notice the Shapiro-Wilk p-value is 0.007. – gung - Reinstate Monica May 27 '22 at 18:38
-
I would consider those data compatible enough despite the p-value. Eg, if those were residuals from a linear model, I would not throw out the model. Try
set.seed(10); re = rexp(20); hist(re); shapiro.test(re). Those data are not compatible despite p=.34. I don't believe all nulls, & I do use tests for some things. I do use them to test models taken as a whole. If an omnibus ANOVA is not even significant, you should stop there & not try to interpret the tests of the constituent variables--that provides some protection against multiple comparisons, eg. But I don't test for normality. – gung - Reinstate Monica May 27 '22 at 19:01
There is some literature comparing the power of different normality tests, often involving both specific tests for normality and more general goodness-of-fit approaches that can be applied to general distributional shapes (so as pointed out already, only calling the former "normality tests" is a misnomer and many would also include specific normality tests in the more general class of "goodness-of-fit tests").
The power generally depends on against what kind of distributions normality is tested, however pretty much all that I have seen favours Shapiro-Wilks over Kolmogorov-Smirnov and Chi-Squared.
See for example
Thode HJ. Testing for normality. New York: Marcel Dekker; 2002.
B. W. Yap & C. H. Sim (2011) Comparisons of various types of normality tests, Journal of Statistical Computation and Simulation, 81:12, 2141-2155, DOI: 10.1080/00949655.2010.520163
Googling "compare normality test" or looking into the references of those cited above will bring up more.
- 23,655