I assume this question has been beaten to death and thus I am just looking for a reference which goes through the details.
Assuming all populations we deal with have finite means and variances (even higher moments if that helps) and the two samples are independent.
The two-sample t-test requires sampling from a normal population. There is a classic t-test assuming the variances of the two populations being sampled are equal and another so called Welch t-test assuming the variances are unequal (with complicated df).
In the case we do not sample from normal populations but suppose we know that the variances are equal we can still show (using CLT and Slustky's theorem) that two sample t-test (for equal variance) is valid (at least as $n_1,n_2\to \infty$). In short, $$\dfrac{(\overline{X}_1-\overline{X}_2)-(\mu_1-\mu_2)}{S_p\sqrt{1/n_1+1/n_2}}= t_{n_1+n_2-2} \to \mathcal{N}(0,1)$$ where the arrow denotes converge in probability.
In the case we do not sample from normal populations but suppose we do know that the variances are unequal does the Welch t-test become valid for large sample sizes? In short, what is the distribution of $$ \dfrac{(\overline{X}_1-\overline{X}_2)-(\mu_1-\mu_2)}{\sqrt{s_1^2/n_1+s_2^2/n_2}}$$ for large $n_1,n_2$?
My thoughts: The answer would be straightforward if $\dfrac{(\overline{X}_1-\overline{X}_2)-(\mu_1-\mu_2)}{\sqrt{s_1^2/n_1+s_2^2/n_2}}$ was exactly $t$-distributed. But (correct me if I am wrong) it is approximately $t$-distributed under appropriate conditions (because there is a $\chi^2$ distribution approximation for the denominator in Student's theorem). So in simpler terms, does the approximation to $t$ distribution improve as $n_1,n_2$ increase? In that case it would also converge in probability to a normal distribution. Does this follow from a general version of Slutsky's theorem and CLT?