What does knowing σ have to do with the shape of a distribution (t vs z)?

Question

My understanding goes something like this:

Let's say we're trying to find a confidence interval (CI) for what the mean of a population is. We take a sample of size $n$. This sample has a mean of $x̄$.
If we keep taking samples, we can form a distribution of x̄'s.
Via the Central Limit Theorem, as $n → ∞$, the distribution will approach a normal distribution. But when $n ≠ ∞$, isn't it a t distribution with $n-1$ degrees of freedom?
I'll assume the above bullet point is true. The distribution has some standard error:

$$SEM=\frac{σ}{\sqrt{n}}$$

Question: What does knowing $σ$ have to do with what the shape of the distribution is? I thought the shape of the distribution is based on the sample size.

For $X\sim N(\mu,\sigma^2)$, $\bar{X}$ has a normal distribution not a t-distribution ... but it's a distribution whose variance we don't know, since we don't know $\sigma$. The t-distribution comes in when you standardize $X$ (or $\bar{X}$) using the estimate of its variance. See Why does the t-distribution become more normal as sample size increases? which explains how estimating the standard deviation impacts the tail of a t-statistic... ctd — Glen_b, Nov 27 '16 at 23:13
ctd... There's not necessarily any progression specifically toward a t-distribution of a t-statistic if you're sampling non-normal variables (at least it would require a theorem I'm not aware of right now), though in practice it may be a reasonable approximation in a variety of situations. In many (but not all) situations you still get an asymptotic normal distribution for the t-statistic. — Glen_b, Nov 27 '16 at 23:18
I'm not understanding why it has a normal distribution. I thought the CLT says that the distribution of $X̄$ approaches a normal distribution as $n→∞$. — Adam Zerner, Nov 27 '16 at 23:22
If you're referring to my first sentence, that doesn't rely on the CLT at all, it is about sampling from normal distributions. — Glen_b, Nov 27 '16 at 23:45
Ah ok, yes, I misread it and I understand that $X̄$ has a normal distribution when the population is normally distributed. But what about the situation when the population is not normally distributed - doesn't $X̄$ have a t-distribution (excluding exceptions that aren't covered in an intro class)? — Adam Zerner, Nov 27 '16 at 23:55
No, that doesn't happen. If that's the issue I think I can probably write an answer illustrating that it's clearly not the case in general. — Glen_b, Nov 28 '16 at 00:07
Oh ok, I understand now. The sampling distribution of $X̄$ might be skewed (t-distribution is bell shaped), but it still approaches normal; the skewness gets less and less and less. With that said, I'm very confused about what the t-distribution is and when/why we use it. I'm having trouble forming the right questions though, so I think I should delete this. Do you agree? — Adam Zerner, Nov 28 '16 at 00:15
I don't think you should delete this. A question based on a common misunderstanding is a useful resource! You could try to make this question somewhat more clearly reflect the misunderstanding you did have but I would keep the question. I already had a little information in my answer about the t-distribution, and I have now added some more based on my earlier comments. Perhaps those may help you formulate a new question to ask about t-distributions — Glen_b, Nov 28 '16 at 00:26
Note also that the central limit theorem does not apply to every distribution under the sun, it has to have a finite second moment. — mdewey, Nov 28 '16 at 16:51

score 8 · Accepted Answer · edited Apr 13 '17 at 12:44

Sample means in general don't have t-distributions.

For example, if I am drawing iid samples from an exponential distribution, the distribution of sample means has a gamma distribution (which is skewed; it is lighter tailed than normal on the left and heavier-tailed than normal on the right); on the other hand, if I am drawing iid samples from a uniform distribution, the distribution of sample means will have a scaled Irwin-Hall distribution (a Bates distribution) -- and that's symmetric but lighter-tailed than normal, so a t (which is heavier-tailed than normal) would always be a worse approximation to the distribution of $\bar{X}$ than the normal in that case.

The usual t-distribution arises when you divide a normally distributed (with zero mean) numerator by and independent estimate of the standard deviation of the numerator (provided that the variance estimate has a scaled chi-squared distribution). Under iid sampling from a normal distribution, the usual t-statistics have these properties and so have t-distributions.

Why would you divide a normally distributed numerator by an estimate of its standard deviation? Every normal distribution is different -- they all have different variance; you don't have a way to directly tell if a sample mean is consistent with some population mean (for example) -- is $\bar{x}-\mu$ unusually far from 0 or not? Well, that depends on the population standard deviation...

You can standardize things like $\bar{x}-\mu$ by dividing by the standard deviation of $\bar{x}-\mu$ (if you know it, but you generally don't). However, if you use an estimate of the standard deviation then the variability in the denominator makes the ratio ($\bar{x}-\mu$ divided by its estimated standard deviation) more heavy-tailed than normal. See the intuition offered at Why does the t-distribution become more normal as sample size increases? for how this happens.

That sort of sounds like we didn't get very far, but actually we have. The standardized distribution only depends on the degrees of freedom, which derive from the sample size. This means we can make tables for it for example.

Consequently we can now perform inference (typically about the population mean) in cases where we don't know the standard deviation, as long as we're sampling (at least to a sufficiently good approximation) from a normal distribution.

If you're not sampling from a normal distribution, there's no general rule I'm aware of that would make a t-statistic have a particular distribution (except asymptotically but then you'd be arguing for convergence to normality, not to a t-distribution).

Consider a CI for the population mean, where we have a sample assumed to be independently drawn from a normal population (but one where we don't know the population mean $\mu$ or variance $\sigma^2$). How are we to give an interval for the mean?

A common way to find a confidence interval proceeds from finding a pivotal quantity. This is a function of the data and the unknown (the parameter $\mu$ in this case) that has a known distribution that doesn't depend on the parameter.

If we knew $\sigma$, we could write $Z=\frac{\bar{x}-\mu}{\sigma/\sqrt{n}}$, which would then be pivotal (since $Z$ has a standard normal distribution). We could write an interval for $Z$ with the desired coverage and then back out an interval for $\mu$ (since everything else is known and we can just rearrange to rewrite the inequalities in the probability statement to leave $\mu$ isolated).

But when we don't know $\sigma$, we can estimate it by $s$, writing $Q=\frac{\bar{x}-\mu}{s/\sqrt{n}}$. It's still pivotal (the distribution of $Q$ doesn't depend on $\mu$, nor on the unknown $\sigma$), but we now have a different distribution at each sample size (specifically, a t-distribution). We can still write an interval for $Q$ with the desired coverage properties and so back out an interval for $\mu$.

Thank you for your detailed answer. My question in the OP stemmed from my incorrect belief that the t-distribution is the distribution of sample means at various sample sizes. At this point, I understand almost all of what you wrote. However, I'm still not grasping why the t-distribution has heavier tails. But that is different from what I asked in this post, and you addressed what I asked in this post. (Side note: I'm working on a blog post explaining all of this stuff; any clarifications/thoughts are much welcomed.) — Adam Zerner, Nov 29 '16 at 20:57
@AdamZerner I gave a link in my answer (and previously to that, in a comment, and now for a third time here in this comment -- see here) which explains the heavier tail of the t in some detail. It arises largely because the values that $s$ takes will often be smaller and will sometimes be much smaller than $\sigma$. (correspondingly the t is more sharply peaked than the normal because sometimes $s$ is larger than $\sigma$). — Glen_b, Nov 29 '16 at 23:08
I looked through it and am failing to grasp it. When $s$ is smaller, it pushes the peak up. When $s$ is larger, it pushes the peak down. But how do we where it will be smaller/larger? How do we know that it'll be smaller at the extremes (pushing it up), and larger in the center (pushing it down)? Also, I don't see why $s$ is often smaller than $σ$. My intuition says that it should usually be bigger, because there's more variability in a small sample. I see that it is sometimes smaller though; you may come across a particularly unvarying sample. — Adam Zerner, Nov 29 '16 at 23:45
Actually, I also don't think I understand the concept of it being larger/smaller in different places. I had previously been thinking about it as, "we don't know whether it'll be larger/smaller, but it's usually larger, so we'll assume that it's larger". — Adam Zerner, Nov 29 '16 at 23:52
Consider two people with the same data but one of them knows $\sigma$ (and uses a $Z$ statistic) and the other must estimate $\sigma$ (and so uses a $t$). Note that we can write $t = Z \cdot \frac{\sigma}{s}$. When $s<\sigma$, then $\frac{\sigma}{s}>1$ so you go further into the tail than when you are using $Z$ (lifting the tail relative to "middling" values of $|Z|$ (near $1$), and when $s>\sigma$, $\frac{\sigma}{s}<1$ so you go nearer the middle than when you are using $Z$ (lifting the peak relative to "middling" values of $|Z|$). ... ctd — Glen_b, Nov 30 '16 at 00:04
ctd... Note that $s^2$ is on average equal to $\sigma^2$ (the sample variance with $n-1$ denominator is unbiased as an estimate of $\sigma^2$) but when sampling from a normal distribution, the distribution of $s^2$ is unimodal and right skew -- the median of the distribution of $s^2$ is less than $\sigma^2$. Correspondingly, the median of the distribution of $s$ must therefore be less than $\sigma$, whence $s$ is more often less than $\sigma$ than it is larger than $\sigma$. — Glen_b, Nov 30 '16 at 00:05
I have a number of comments on your blog post but I'll take them to chat — Glen_b, Nov 30 '16 at 00:08
I think it finally clicked!! I was failing to see the transformation in the graph. Say $s > σ$. This means $t > Z$. Imagine starting with a z-distribution and transforming it into a t-distribution. All of the values get scaled outward, horizontally. This leads to the t-distribution being fully above the z-distribution. Which confused me at first, but I realized that we're dealing with relative frequency in the y-axis, and when we scale accordingly, it looks the way it should, with heavy tails in the t-distribution, and the z-distribution above the t-distribution in the middle. ... ctd — Adam Zerner, Nov 30 '16 at 04:06
ctd... Does all of that sound correct? I just have a few questions remaining (but overall feel much clearer). My questions revolve around the fact that we don't know whether $s > σ$ or $s < σ$. In the former case, we get a t-distribution with heavy tails, but in the latter case, we get a t-distribution with light tails. How can we say that the t-distribution has heavy tails when it depends on whether or not $s > σ$? — Adam Zerner, Nov 30 '16 at 04:09

score 1 · Answer 2 · edited Jan 26 '23 at 20:29

No. First of all, there are restrictions on the moments of the population distribution for the central limit theorem to apply. For example, the sample mean from a Cauchy distribution does not converge to the normal. Regarding the t distribution, if the population is normal then the sample mean normalized by the sample standard deviation has a t distribution with $n-1$ degrees of freedom where $n$ is the sample size. The sample mean for a non-normal population divided by the standard deviation or not will not have a t distribution.

With regard to your question about standardizing with a known standard deviation if the observations are normal the standardized mean will have a normal distribution with a standard deviation of 1.

I thought the CLT says it doesn't matter whether the population is normally distributed? (assuming we're not dealing with an exception) — Adam Zerner, Nov 27 '16 at 23:11

What does knowing σ have to do with the shape of a distribution (t vs z)?

2 Answers2