Use of the Student T distribution for the t-statistic

Question

The variable $T$ is given by the classic form of the one-sample t-test the population mean under a hypothesis such as $H_0: \mu = \mu_0$

$$T = \frac{\bar{X_n}-\mu_0}{S/\sqrt{n}}$$

where $\bar{X_n} = \sum_{i=1}^n X_i$ and $S = \sqrt{\frac{1}{n-1}\sum_{i=1}^n (X_i - \mu)}$ and where $X_i$ follows some distribution with population mean $\mu$ and variance $\sigma^2<\infty$. I am aware of the following two cases:

(1) If $X_i \sim^{i.i.d} N(\mu, \sigma^2)$ for $i=1,..,.n$ then $T$ is (exactly) Student T distributed with $n-1$ degrees of freedom

(2) If $X_i$ is drawn $i.i.d$ for all $i=1,...,n$ from a distribution with a finite second moment, then $T$ converges to a standard normal distribution as $n\rightarrow \infty$ (see a great derivation here).

Now, in practical, large-sample applications of the one-sample t-test, I understand that $T$ is only normally distributed in the limit. Thus, it is usually stated that one should use a Student T distribution for $T$ when determining critical-values, etc (for instance, see here, here, and here). I understand that a Student T distribution also converges to a Normal distribution as $n\rightarrow \infty$, but this is not sufficient justification to use the Student T distribution for the random variable $T$ for finite (even if large) $n$.

All answers to this question that I have seen argue that using the Student T distribution is more conservative: either this is just stated with intuitive appeal to the fact that we have to estimate $\sigma$ using $S$ (which is not nearly enough in my mind to justify the use of the Student T distribution) or appeals to simulation (again, not enough since these simulations can only ever test a finite number of distributions for $X_i$). For instance, this is approximately the answer given for this extremely well written question on the exact same topic here.

To this end, I am looking for some sort of mathematical (not intuitive and not based on simulation) justification that $T$ not only can but should be approximated as a Student T distribution in the case where $X_i$ are not normally distributed, $\sigma$ is unknown, and $n$ is finite (but large). A reference to a textbook or journal article would also be sufficient.

I doubt any such reference exists unless it closely circumscribes the possible parent distributions. See https://stats.stackexchange.com/questions/411699 for an extremely well-known and practical counterexample. — whuber, Nov 28 '22 at 17:14
I guess I'm confused mathematically of where the use of the Student T distribution comes from? We either have only that the asymptotic distribution is Normal or the exact distribution is Student T if $X_i$ are iid Normal. So where does the Student T come into play from a mathematical perspective for finite (but large) $n$? I don't see the mathematical justification. — Mr Saltine, Nov 28 '22 at 19:20
You're looking in the wrong direction: the innovation of Student's t test lies in its application to small $n,$ not large $n.$ The mathematical justification arises from a Normality assumption about the parent distribution. The statistical justification requires pointing to practical situations where that Normality assumption is reasonable. That's what "Student" did in his original 1908 paper, where he examined subsamples of size 4 from a morphometric dataset. — whuber, Nov 28 '22 at 22:29
Thanks for the response @whuber. I think my confusion comes with the general advice (e.g., see the answer to this question here that we always resort to using the Student T distribution if the variance is unknown. I understand that simple rules of thumb for applying the CLT (e.g., $n>=30$) can be problematic based on the underlying distribution of the data. ----- — Mr Saltine, Nov 29 '22 at 12:56
However, the proposal that one should then use a t-test instead of a z-test seems misguided. If $n=30$, let's say, is not enough to appeal to the CLT in a particular case, then how is the use of the Student T a solution to the problem if your data is not Normal? If $n=30$ is enough, then just use the Normal; if $n=30$ is not enough, what justification is there for using the Student T when the underlying data is not Normal? — Mr Saltine, Nov 29 '22 at 12:56
There's a good reason the answer you reference has no upvotes! The thread at https://stats.stackexchange.com/questions/69898 provides some counterpoint to it (and, I believe, support for your position). However, see the duplicate threads for some rationale for using the t test regardless, especially https://stats.stackexchange.com/a/85811/919. — whuber, Nov 29 '22 at 15:02

Use of the Student T distribution for the t-statistic

0 Answers0