Under what conditions should I use an approximate Z-score vs a t-test?

Question

I am struggling to understand the limiting assumptions of simple hypothesis testing using Z and T statistics under different scenarios.

In a case where X is normally distributed, and n > 30, and $\sigma^2$ is known, it is obvious a Z-test is appropriate. In the same scenario, if n < 30, a Z test is still appropriate, because we don't need to rely on CLT for a normal distribution of the sample mean and $\sigma^2$ is known.

However, in a case where X is normally distributed, n < 30, and $\sigma^2$ is not known, should I use a t-test or an approximate Z-test, substituting $\sigma$ for s, because the data are still normal? Or is the fact that n < 30 enough to warrant switching to a t-test, because s is not a good estimator of $\sigma$ for small samples normal or not?

Similarly, suppose X is not normally distributed, but n > 30, and $\sigma^2$ is not known. It seems we can still use an approximate Z-test because CLT implies that the distribution of the sample mean will be normal?

So is the only situation that I would resort to a t-test (for a single sample) one in which $\sigma^2$ is unknown, and n< 30?

When X is normally distributed the "Z test" is appropriate for testing the mean & N>30 is not required. When X is normally distributed and $\sigma^2$ is unknown use the t test for testing the mean. There is nothing magic about 30. With $\sigma$ unknown there is no way to substitute $\sigma$ for s. Use the t test as the t distribution is appropriate. For large n the t distribution is close to the standard normal. You need to decide for yourself what n makes t close enough to the standard normal to use the normal table instead of t table. . — Michael R. Chernick, Oct 10 '19 at 00:54
When X is not normally distributed things depend on the population distribution. If the distribution is heavy-tailed like the Cauchy the mean may not exist & the CLT doesn't apply. So testing a population mean isn't appropriate. Of course in most situations where the first & second moments exist the CLT applies & you have to judge whether or not the sample size is large enough for the sample mean in your case to be close enough to normally distributed. — Michael R. Chernick, Oct 10 '19 at 01:20

AdamO · Accepted Answer · 2019-10-09T22:21:05.693

1

When the variance of the response is not known and thus is estimated from the sample, you should use a T-test, regardless of whether or not the underlying distribution is known to be normal. The arbitrary $n=30$ cut-off continues to be mentioned for legacy reasons. In the olden days, you calculated p-values by referring to tables. One could only reasonably include so many quantiles for so many t-distributions of various degrees of freedom. Modern computation has made that a complete moot point. The precision of the normal approximation to high df T distributions is arbitrarily close, but if it were to make any difference, we would of course appeal to the more conservative T-distribution.

edited Oct 09 '19 at 22:21

answered Oct 09 '19 at 22:11

AdamO

62,637

Thank you for responding! So you're saying, in any case where $\sigma^2$ is unknown, just use the T-distribution? What's confusing me is that our text for the stats course I'm in explicitly discusses calculation of the approximate Z-score for cases where n is large and $\sigma^2$ is unknown. Is that just outdated? – Ecostudent Oct 09 '19 at 22:16
To dignify your textbook's advice to replace t by z for n>30 by calling it "outdated" would imply there was a time when the advice made sense. I cannot think of a good reason to have done it at any point in the last century beyond the possibility that maybe someone at some point had an unsuitably truncated t-table. You should also avoid the temptation to think of 30 as "large". There's nothing special about 30 other than a lot of books mention it for no well-justified reason (beside the fact that they see other books mention it, rather akin to "if all your friends jumped off a cliff..."). – Glen_b Oct 10 '19 at 01:15

Under what conditions should I use an approximate Z-score vs a t-test?

1 Answers1

Linked