8

It's well established that both Anderson-Darling and Shapiro-Wilk have a much higher power to detect departures from normality than a KS-Test.
I have been told that Shapiro-Wilk is usually the best test to use if you want to test if a distribution is normal because it has one of the highest powers to detect lack of normality, but my limited experience, it seems that Shapiro-Wilk gives me the same result as Anderson-Darling every time.
I thus have two questions:

  • When does the Shapiro-Wilk test out-perform Anderson-Darling?
  • Is there a uniformly most powerful lack of normality test, or, barring that possibility, a normality test that out-performs nearly all other normality tests, or is Shapiro-Wilk the best bet?
  • Here's a relevant-looking ref, though I haven't read past the abstract: Newson 1991 Statistics in Medicine "Estimating departure from normality" – onestop Mar 12 '11 at 12:32
  • Two more recent ones, both in the Journal of Statistical Computation and Simulation: Yazicia & Yolacan 2007 Romão, Delgado & Costa 2010 – onestop Mar 12 '11 at 13:05
  • Wouldn't the likelihood ratio test give you the most powerful test by the Neyman-Pearson Lemma? My impression has been that all of these "other" tests are merely easily computable approximations of the LRT (possibly with constraints on the alternative.) – charles.y.zheng Mar 12 '11 at 15:08
  • 3
    @Christopher, if you want to even have a notion of "most powerful" you need to specify a class of alternatives. Is it the class of all distributions $F$ on the real line? All of those with a density $f$ with respect to Lebesgue measure? All elliptical distributions? Etc. The Anderson--Darling (AD) test is, of course, more powerful in general than the KS test because AD can be viewed as weighting the tail observations much higher. So, if you get any unexpected deviations from normality in the tails, the AD is going to pick that up immediately. – cardinal Mar 12 '11 at 16:00
  • 1
    @charles.y.zheng, the Neyman--Pearson lemma is for comparing two point hypotheses. So, a LRT would only be (guaranteed to be) most powerful when using a fixed set of parameters for the normal distribution and testing against another fixed distribution as the alternative. I'm guessing that's not the situation of interest here. – cardinal Mar 12 '11 at 16:02
  • @charles.y.zheng, please define "empirical variance". The MLE of the variance of a normal is never equal to the (unbiased) sample variance. – cardinal Mar 12 '11 at 16:10
  • @cardinal: After some thinking I realized my mistake. The LRT should still be the most powerful but you have to constrain the alternative for it to even make sense. – charles.y.zheng Mar 12 '11 at 16:10
  • @cardinal: the MLE of the variance for the normal distribution is not an issue. It is the sum-of-squares deviation-from-the-mean of the data divided by sample size. – charles.y.zheng Mar 12 '11 at 16:13
  • 2
    @charles.y.zheng, to repeat, generalized LRT (formed from considering maximization over compositive null and alternative parameter spaces) are not in general most powerful tests. In some cases you can get uniformly most powerful tests, which are essentially what you seem to be getting at. But those are few and far between and almost exclusively restricted to exponential families. Also UMP tests almost never exist when there are nuisance parameters present. – cardinal Mar 12 '11 at 16:16
  • @charles: NP would work if I could specify the distribution exactly, ie: $ H_0: X_i \sim \mathcal{N}(0,1) $ and $ H_A: X_i \sim \mathcal{P}(3) $. I was hoping for something that tested $H_0: X_i \sim \mathcal{N}(\mu, \sigma^2) $ against $H_A: X_i \nsim \mathcal{N}(\mu, \sigma^2)$. – Christopher Aden Mar 12 '11 at 20:59
  • @Onestop: Romão, Delgado & Costa 2010 was exactly what I was looking for. I suppose I wasn't clear that I wasn't looking for a rigorous mathematical explanation of when SW was better than AD, but some conditions for when one out-preformed the other. If you submit your response as an answer, I can accept it. – Christopher Aden Mar 14 '11 at 20:32
  • @Christopher i found that ref from a simple Google Scholar search and I've only read its abstract, which doesn't in itself answer your first bullet (which seems more interesting to me than your second bullet, to which i reckon the answers are "No", "No" and "it depends"). If that ref contains an answer to your 1st Q it would be great if you post it yourself, which is completely within the site's protocol, as is accepting your own answer. – onestop Mar 14 '11 at 21:10
  • I used my university's VPN to read the article. Towards the end of the article are a list of power calculations under various conditions for quite a few GoF tests when the underlying distribution is one of maybe 30 non-normal distributions. That answered the question pretty well, showing when SW does better than AD. – Christopher Aden Mar 14 '11 at 23:51

2 Answers2

3

If the only criterion is most powerful then nothing beats SnowsPenultimateNormalityTest which is in the TeachingDemos package for R. However that test has an unfair advantage in the power competition and some may consider it less capable in other areas, for one, it is of the class of functions for which the documentation is probably (hopefully) more useful than the function itself.

What is more important is to consider what it means when these tests of normality reject the null, or fail to reject the null.

Greg Snow
  • 51,722
  • I took a look at the documentation for your function, Greg. It certainly gave me a chuckle. Unfortunately this wasn't exactly what I was looking for--it lacks the sort of application that I'd like. – Christopher Aden Jun 22 '11 at 21:11
-1

In generall I would advice to use more than just one test. Through statistic programms like R it isn't as much efford as just typing one line per test.

The Shapiro-Wilks and the Anderson-Darling-Test are both tests which operate on comparing the distribution functions or analyse the variance. For example the Cramér-von-Mises or the Jaques-Bera-Test use comparisms between kurtosis and skrewness of a function. In some cases it is possible that the results between both test variants could differ.

Depending on the situation of test every test has it's might and weakness. Hence it's advisable to try different tests on the same data set. If the data set is clearly normaly distributed the results shouldn't differ verry much.

beyeran
  • 114