6

Hypothesis testing such as Anderson-Darling or Shapiro-Wilk's test check normality of a distribution. However, if the sample size is very large, the test is extremely "accurate" but practically useless because the confidence interval is too small. They will always reject the null, even if the distribution is reasonably normal enough.

How should I test normality when sample size is very large, other than visualizing histograms?

The motivation is that I want to automate checking normality of large data set in a software platform, where everything needs to be automated, not manually visualized and inspected by humans.

One thing that came across me is that instead of using Shapiro-Wilk test, I calculate kurtosis and skewness of the distribution, and if they are $\pm 1.0$, I can assume that my large dataset is "reasonably" normally distributed.

Is my approach correct, or is there any other alternatives?

Eric Kim
  • 1,041
  • 1
    Don't know about just looking at skewness and kurtosis. Seems the main issue is to be realistic about what closeness to normality is needed for the application at hand. – BruceET Jun 24 '19 at 06:41
  • 4
    What's your motivation to check normality in the first place? Why is normality important in your application? – COOLSerdash Jun 24 '19 at 07:43
  • 1
    Shapiro, not Saphiro. Hypothesis tests of assumptions answer the wrong question (e.g. see here); when it comes to assumptions of tests, I suggest avoiding them at any sample size. – Glen_b Jun 24 '19 at 08:44
  • @Glen_b I came up with this question after reading the link you put. In the link, I read that hypothesis testing for normality in very large sample size is useless. My question is, how do I know that my distribution is normal "enough", and what technique i should use, other than visualizations? What techniques are realistic about measuring closeness to normality, as Bruce says? – Eric Kim Jun 24 '19 at 16:46
  • 1
  • Please fix the spelling in your question as pointed out earlier. 2. Some of the answers at that other post go rather further. Indeed, I'd say that large sample sizes just make the uselessness obvious, but more generally it's not only not useful, it's actually often counterproductive (often leading you into doing exactly the wrong thing and at the same time screwing up the properties of your significance levels and p-values).
  • – Glen_b Jun 25 '19 at 01:35
  • 3
  • I challenge the premise of the question -- given the problems with choosing analysis on the basis of what you find in the data, automated checking doesn't strike me as being as useful as building something that's more robust to violations of anything you can't reasonably assume.
  • – Glen_b Jun 25 '19 at 01:39
  • You just, don't do it! – David Jun 25 '19 at 08:03