I am working through the concept of the need for normality in the underlying population when performing a t-test. This is nicely expounded by @Glen_b here. The gist of the explanation, I think, is that for the t test to follow a t distribution the numerator, $\bar X-\mu$, must be normally distributed, and also the denominator, $s\over \sqrt n$, is to fulfill the requirement that $s^2 \over \sigma^2$ conforms to a $\chi^2_d$, and be independent (numerator from denominator).
My questions are:
- Can it be shown with a Monte Carlo simulation (e.g., using
R) that a t statistic on sample means extracted from a non-normal distribution doesn't necessarily follow a t distribution? - What would be the repercussions of this for calculating confidence intervals along the lines of the discussion here. Jotting down a likely explanation would be that the issues that apply to the application of a t-test to compare sample means (discussed on the first hyperlinked post) are simply not applicable to sampling distributions as a result of the CLT.
As a way of example of what I'm considering, a possible (probably flawed) approach to the first part of the question would be to extract samples from a $\chi^2_1$. Thanks to the help from the commenters at this point I got this plot with code here:
This is surprising because I expected to see more of a discrepancy between the t-statistic and the t-test based on the underlying population (chi-squared). Although perhaps it should not be surprising at all if we compare it with samples from the almighty Normal, which fit just so:
So is the offset in the first plot "clearly" off? Is it fair to compare it to the normal? Just side questions, the actual points I am asking are clearly stated above.
EDIT: If there is no official answer, I want to at least reflect here the valuable tip offered by @Scortchi in the comments to illustrate how real the offset is:
The 0.5% quantile for the t-test statatics generated in the simulation quantile(ts, 0.005) = -9.682655, whereas qt(0.005, df=9) = -3.249836.


tsdoes not make sense to me. You should be computing t-statistic for each of your 1000 samples by dividing that sample's mean (with null hypothesis value subtracted) by that sample's standard error of the mean. – amoeba Nov 30 '15 at 17:34t.stat.sim <- function(n){x <- rchisq(n, df=1); return(sqrt(n)*(mean(x) - 1) / sd(x))}. Then collect the t-statistics for 1000 samples of ten:ts <- replicate(1000, t.stat.sim(10)). [Don't know if you saw that before deleting the answer.] – Scortchi - Reinstate Monica Nov 30 '15 at 17:39quantile(ts, c(0.005, 0.025, 0.05)with those of Student's t distributionqt(0.025, df=9). It's these you'd use in forming tests & confidence intervals. Often it's useful to plot cumulative distribution functions & zoom in on the tails. – Scortchi - Reinstate Monica Dec 01 '15 at 10:22tsis as defined in your prior comment, or in the code that I share on a hyperlink, and that I used to generated the plots? And, on the second expression, I can't findsample_tsdefined on either your code or mine. – Antoni Parellada Dec 01 '15 at 13:52tsas in your code (should be the same anyway) but without the truncationts <- ts[ts<10 & ts>-5](which I don't understand). – Scortchi - Reinstate Monica Dec 01 '15 at 13:57sample_ts? – Antoni Parellada Dec 01 '15 at 13:59tsthat by mistake, then copied it instead ofqt(0.975, df=9). I edited the code in the comment above. – Scortchi - Reinstate Monica Dec 01 '15 at 14:09qt(0.005, df = 9)instead ofqt(0.975, df=9), correct? – Antoni Parellada Dec 01 '15 at 14:13