1

I m trying to check my data normal_data for normal distribution in R.

I always used shapiro.test or ks.test, but now I have more than 5000 values to check. Is there any other function or possibility to check the data?

  • 3
    @Dave The possible misunderstanding, now deleted, was subtler than that. It read "According to the central limit theorem, no matter what kind of distribution we have, the sampling distribution tends to be normal if the sample is large enough (n $\gt$ 30)." Missing from this was any sense of what statistic is involved. For instance, the sampling distribution of the maximum is not going to be Normal (and rarely even close to it). – whuber Oct 29 '21 at 19:27
  • If you really must test a sample x of size greater than 5000 for normality, then you can use shapiro.test(sample(x, 5000), In practice, huge samples often have 'harmless' quirks that lead to non-informative rejection. Example: pv = replicate(10^4, shapiro.test(rt(5000, 70))$p.val); mean(pv <= .05) returns $ 0.146 >0.05.$ How often is the distinction between $\mathsf{T}(70)$ and standard normal of practical importance? – BruceET Oct 29 '21 at 21:52