The variable $T$ is given by the classic form of the one-sample t-test the population mean under a hypothesis such as $H_0: \mu = \mu_0$
$$T = \frac{\bar{X_n}-\mu_0}{S/\sqrt{n}}$$
where $\bar{X_n} = \sum_{i=1}^n X_i$ and $S = \sqrt{\frac{1}{n-1}\sum_{i=1}^n (X_i - \mu)}$ and where $X_i$ follows some distribution with population mean $\mu$ and variance $\sigma^2<\infty$. I am aware of the following two cases:
(1) If $X_i \sim^{i.i.d} N(\mu, \sigma^2)$ for $i=1,..,.n$ then $T$ is (exactly) Student T distributed with $n-1$ degrees of freedom
(2) If $X_i$ is drawn $i.i.d$ for all $i=1,...,n$ from a distribution with a finite second moment, then $T$ converges to a standard normal distribution as $n\rightarrow \infty$ (see a great derivation here).
Now, in practical, large-sample applications of the one-sample t-test, I understand that $T$ is only normally distributed in the limit. Thus, it is usually stated that one should use a Student T distribution for $T$ when determining critical-values, etc (for instance, see here, here, and here). I understand that a Student T distribution also converges to a Normal distribution as $n\rightarrow \infty$, but this is not sufficient justification to use the Student T distribution for the random variable $T$ for finite (even if large) $n$.
All answers to this question that I have seen argue that using the Student T distribution is more conservative: either this is just stated with intuitive appeal to the fact that we have to estimate $\sigma$ using $S$ (which is not nearly enough in my mind to justify the use of the Student T distribution) or appeals to simulation (again, not enough since these simulations can only ever test a finite number of distributions for $X_i$). For instance, this is approximately the answer given for this extremely well written question on the exact same topic here.
To this end, I am looking for some sort of mathematical (not intuitive and not based on simulation) justification that $T$ not only can but should be approximated as a Student T distribution in the case where $X_i$ are not normally distributed, $\sigma$ is unknown, and $n$ is finite (but large). A reference to a textbook or journal article would also be sufficient.