I am aiming to reproduce the issue (using Python) described in the answer here Is normality testing 'essentially useless'?
It states that the assumption of normality is more likely to be violated when the sample size of the studied distribution is large (n>1000) and the Shapiro-Wilk test for normality will become more sensitive to outliers. In other words, the Shapiro-Wilk test will more often extract the p-value as < 0.05 (rejecting the null hypothesis for normality of the distribution) on repetitive simulations of almost normally distributed data.
However, when I tried with Python implementation, there is almost never p<0.05 independently of the sample size I used to generate the distribution. I checked the implementation of the Shapiro-Wilk test both in Python (scipy.stats) and R (stats package) and they use the same algorithm from the paper ALGORITHM AS R94 APPL. STATIST. (1995) VOL. 44, NO. 4.
Why do I not obtain similar results with scipy? I attach my code below.
import pandas as pd
import numpy as np
distributions = []
for _ in range(100):
tmp_dist =[shapiro(np.concatenate((np.random.normal(0, 1, 10), [1, 0, 2, 0, 1])))[1],
shapiro(np.concatenate((np.random.normal(0, 1, 100), [1, 0, 2, 0, 1])))[1],
shapiro(np.concatenate((np.random.normal(0, 1, 1000), [1, 0, 2, 0, 1])))[1],
shapiro(np.concatenate((np.random.normal(0, 1, 5000), [1, 0, 2, 0, 1])))[1],
shapiro(np.concatenate((np.random.normal(0, 1, 20000), [1, 0, 2, 0, 1])))[1]]
distributions.append(tmp_dist)
df = pd.DataFrame(distributions, columns = ['n10','n100','n1000', 'n5000', 'n20000'])