Context: I have spent the last few weeks thinking about how the central limit theorem is enunciated: if we have a set of i.i.d. random variables $X_1, X_2, \ldots X_m$ then $\frac{\sum_{i=1}^mX_i - E[\sum_{i=1}^mX_i]}{\sqrt{m}} \xrightarrow[n \rightarrow +\infty]{d}N\Big(0 \hspace{2mm} , \hspace{2mm}\frac{Var(\sum_{i=1}^mX_i)}{m}\Big)$.
What is drawing my attention is the fact that this statement is not spelled in terms of an if and only if condition; therefore we might be able to come up with a set of non i.i.d random variables that nonetheless do satisfy the convergence in distribution above.
Then I thought about the following setup. Let $\{X_i\}_{i=1}^{60}$ be a sequence of random variables where $X_i \sim \text{Uniform}(0, i)$. We define the random variable $Z = \frac{\sum_{i=1}^{60}X_i - 915}{\sqrt{60}}$. I have devised a code in R for obtaining a sample of this random variable.
sample_z = function(n) {
random_sample = matrix(nrow = n, ncol = 60)
for (i in 1:60) {
random_sample[, i] = runif(n, min = 0, max = i)
}
(rowSums(random_sample) - 915)/sqrt(60)
}
We can, of course, after generating a sample of this random variable run a Kolmogorov-Smirnov test against $H_0: \mathbb{P} = N(0 \hspace{0.2mm}, \hspace{0.5mm} 72^{-1}7381)$ —which is equivalent to saying that the convergence in distribution does hold even though the conditions required by the central limit theorem do not—. But there is an even better thing we can do. According to Corollary 1 of this document as the sample size goes to infinity the probability of rejecting an erroneous null hypothesis must converge to $1$. Then, what we will do is try to approximate the probability of rejection of $H_0$ when the size of the sample of $Z$ is big.
My code for computing the KS test statistic is the following:
ks_test_stat = function(n) {
# Generating a sample of Z.
x = sample_z(n)
Computing the ecdf and lagged ecdf of the data.
data_ecdf = cumsum(table(x))/length(x)
lagged_ecdf = c(data_ecdf[1], data_ecdf[-length(data_ecdf)])
Computing the theoretical cdf values of the data.
quantiles = as.numeric(names(data_ecdf))
theoretical_cdf = pnorm(quantiles, mean = 0, sd = sqrt(7381/72))
Computing the vertical distances.
dist_2_empirical = abs(theoretical_cdf - data_ecdf)
dist_2_lagged = abs(theoretical_cdf - lagged_ecdf)
Choosing the maximum of those distances.
max(pmax(dist_2_empirical, dist_2_lagged))
}
Then, we approximate the probability.
# Sorry for the vanilla implementation; couldn't think of a better way to do it.
observations_ks = vector(length = 1000)
for (i in 1:1000) {
observations_ks[i] = ks_test_statistic(10000)
}
We will be considering a 5% significance level.
sum(observations_ks >= 0.0136)/1000
See for yourself that running the chunk above yields a probability of rejection that approximates $0.05$ rather than approximating $1$. There you have it. $H_0$ is not erroneous.
However —and this is where my actual question begins— this answer would not appease the Mathematician. From a previous question I made I learned that if we denote by $G$ the cdf corresponding to the $N(0 \hspace{0.2mm}, \hspace{0.5mm} 72^{-1}7381)$ distribution and we manage to prove that $$sup_z|F_n(z) -G(z)|\xrightarrow{a.s./p}0$$
then $H_0$ is formally proved. My question is how can this be proved?