If you intend to run some analysis like an ANOVA or linear regression that assumes normality, how do you determine if a given method for checking normality is appropriate? What kinds of issues should it check for? What tolerance should the normality test have?
I often have cases with >10k values, and I know that Shapiro-Wilk rarely works for such large datasets. There are other alternatives, but what criteria should be used to evaluate the other options? What I don't want to do is try a bunch and pick the one that best supports my hypothesis (that the data is normal).
From my understanding, the main reason for normality testing is that deviating too far from a normal distribution can bias some analyses to inflate/deflate the alpha. Is that correct?
If so, could a bootstrapped approach be a general (albeit slow) alternative to normality testing?
- Scale the residuals from each condition to mean=0, sd=1.
- Sample ~100 values from those scaled residuals, and run a t-test on the sampled values
- Repeat the previous step ~10k times
- Calculate the proportion of repetitions with p-value < alpha.
- The difference between that proportion and alpha is indicative of the false positive rate due to the residuals' distribution. So if alpha is 0.05 and the proportion of p-values<alpha is 0.0506, you conclude that the distribution is unlikely to impact the false positive rate.
As much as I would like someone to just say, use [method X]. I'd rather get a general sense to make the decision myself.
Edit: This post suggested by @nick-cox below generally answers the question
In the end, you often get a binary decision, like "Use an ANOVA or go non-parametric". So this is case where I'd rather have a binary outcome.
– sharoz Mar 16 '21 at 14:12"a test for normality is directed against a class of alternatives if it is sensitive to alternatives from that class, but not sensitive to alternatives from other classes. Typical examples are tests that are directed towards skew or kurtotic alternatives. The simplest examples use the sample skewness and kurtosis as test statistics. Directed tests of normality are arguably often preferable to omnibus tests (such as the Shapiro-Wilk and Jarque-Bera tests) since it is common that only some types of non-normality are of concern" – sharoz Mar 16 '21 at 14:28