This is a concept that I have always struggled to understand: We can write the formula for a Two Sampled T- Test (https://en.wikipedia.org/wiki/Student%27s_t-test) to compare the sample averages from two populations as follows:
$$ t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1 + n_2 - 2}}\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}} $$
In university, we are always told that for the results of this test to be valid, the distribution of both samples collected MUST be Normally Distributed - yet we are never told why exactly this is. After much thinking and consideration, I think that if the both the sizes of both samples are large, this Normally Distributed condition is not required.
Below is my logic to demonstrate why this is not the case for a specific example where $n_1 = n_2$ and $s_1 = s_2$.
Part 1: First, take the numerator of above term and divide by $ \frac{\sigma}{\sqrt{n}} $. Using the Central Limit Theorem, this modified numerator is now (asymptotically) Normally Distributed regardless of the distributions of $x_1$ and $x_2$:
$$ \frac{\bar{x}_1 - \bar{x}_2}{\sigma/\sqrt{n}} \sim \mathcal{N}(0,1) $$
Part 2: Since we divided the numerator by $ \frac{\sigma}{\sqrt{n}} $ , we also have to divide the denominator by $ \frac{\sigma}{\sqrt{n}} $.
Using some algebra, we can write the denominator as:
$$ \sqrt{\frac{(n-1)s^2 + (n-1)s^2}{2n - 2}}\sqrt{\frac{1}{n}+\frac{1}{n}} = \sqrt{2} \cdot \left(\frac{s}{\sigma}\right) $$
Now we proceed to divide the above term by $ \frac{\sigma}{\sqrt{n}} $ :
$$ \frac{\sqrt{2} \cdot \left(\frac{s}{\sigma}\right)}{\sigma/\sqrt{n}} = \frac{s}{\sigma} $$
Now, using Moment Generating Functions, we can show that the following is asymptotically true (https://online.stat.psu.edu/stat414/lesson/26/26.3) regardless of any underlying distribution:
$$(n-1)\frac{S^2}{\sigma^2} \sim \chi^2_{n-1}$$
Part 3: And finally, when we divide the results from Part 1 and Part 2 (A normal divided by the $\sqrt{\chi^2(s)/s}$ gives you a t-distribution -- proof), this results into a T-Distribution:
$$ \frac{\text{N}(0,1)}{\sqrt{\chi^2_{n-1}/(n-1)}} \sim \text{T-distribution}_{n-1} $$
Therefore, we have shown that regardless of the distribution for any two samples, provided the sizes of both samples are large enough - the T-Test DOES NOT require both of these samples to be Normally Distributed.
For simplicity sake, I did this for $n_1 = n_2$ and $s_1 = s_2$, but I think that the results will also hold for when this is not the case , i.e. $n_1 \neq n_2$ and $s_1 \neq s_2$.
Can someone please comment on my mathematical analysis? Have I done this correctly?