The question is related to another previous question.
I am interested in the process described in the text below the links. For curiosity, I ask myself whether it is a good option or not and the purpose of this question is to discuss about it. None of the following links reply to this question:
- Is normality testing 'essentially useless'?
- Testing large dataset for normality - how and is it reliable?
- Normality testing with very large sample size?
We have available a dataset with $60'000$ data and need to ensure that they come from a normal distribution. As it is well known, it is unecessary to test on all the $60'000$ data as it will greatly increase the power of the test and lead to rejecting with certainty the null hypothesis 'the data come from a normal distribution'.
What it is usually done is testing on a smaller set of data drawn randomly from the larger dataset. However, this may not be representative enough, and since I have $60'000$ data at hand, I used another approach to have a more accurate result.
I randomly drew with replacement $50$ subsets of data from the larger dataset and for each of these subsets, i performed a normality test with a significance level of $5\%$. At the end, we must observe that no more than $5\%$ of the test failed in order to accept the null hypothesis.
Critics about this way to test for normality on a big dataset are welcome. I am asking myself if it is useless to do it or it actually give a more accurate result.
I do not want to discuss why I am doing a normality test and if it is useful or not in my situation. The goal here is just to think about the process I have just described.