3

I'm reading up on asymptotics and hypothesis testing and was thinking about how they link together with regression coefficients.

I have read that the CLT shows that the standardised sample mean converges to the standard normal, and this allows us to carry out hypothesis tests using the t-test.

I was wondering, does this hold for estimates that aren't the sample mean, such as regression coefficients from OLS, MLE etc. I'm aware that these are consistent, but that confirms that they converge to the true value, but I'm unaware if it says anything about the distribution of the estimates. So, if the CLT does cover regression coefficients, why is this the case?

Geoff
  • 601
  • 3
    Googling "asymptotic normality of OLS" and "asymptotic normality of MLE" you can find many websites and videos that cover these proofs. – suckrates Sep 14 '23 at 18:37
  • 1
    Relevant: https://stats.stackexchange.com/questions/152541/central-limit-theorem-on-a-linear-combination, https://stats.stackexchange.com/questions/579262/asymptotic-normality-of-ols-estimators-in-practice – kjetil b halvorsen Sep 14 '23 at 23:27

3 Answers3

3

OLS and MLE estimates are "sample means" -of some random variable. For example, for OLS we have

$$\sqrt{n}(\hat \beta_{OLS}-\beta) = \left(\frac 1n X'X\right)^{-1}\cdot \sqrt{n}\left(\frac 1n \sum_{i=1}^{n} \mathbf x_i \varepsilon_i\right).$$

Here, the "sample mean" is of the (vector) random variable $\mathbf x_i \varepsilon_i$.

For MLE, the "sample mean", is of the gradient of the log-likelihood of the sample, and it is also a vector random variable.

So we have this fundamental connection to the CLT already, on which we build the, sometimes a bit more elaborate, proof that (most of these) estimators (centered and scaled appropriately) are asymptotically Normal.

2

The estimated regression coefficients are functions of the data (both the response and explanatory variables). The regression model specifies the conditional distribution of the response given the explanatory variables, so this implies a particular conditional distribution for the estimated regression coefficients given the explanatory variables. That is, in general, if the regression model fully specifies the distribution of $\mathbf{y} | \mathbf{x}$, then this implies a distribution for $\hat{\boldsymbol{\beta}}(\mathbf{y}, \mathbf{x}) | \mathbf{x}$.

When using Gaussian linear regression (the standard model form) we have $\mathbf{y} | \mathbf{x} \sim \text{N}(\mathbf{x} \boldsymbol{\beta}, \sigma^2 \mathbf{I})$. If we estimate using the OLS estimator for the regression coefficients then the estimator is a linear function of the error vector. Consequently, it turns out that the resulting vector of estimated regression coefficients is also normally distributed, with distribution:

$$\hat{\boldsymbol{\beta}}(\mathbf{y}, \mathbf{x}) \sim \text{N} (\boldsymbol{\beta}, (\mathbf{x}^\text{T} \mathbf{x})^{-1} \sigma^2 ).$$

Now, it also turns out that the normality of the vector of estimated regression coefficients is quite robust to the model assumptions. Even if the error terms in the model are not normally distributed, if you have a large amount of data then there is a variation of the central limit theorem that ensures that the estimated regression coefficients are close to normal. (Actually, consistency of the estimator requires conditions on the explanatory variables; see this related answer.)

Ben
  • 124,856
  • 1
    +1. Regarding the asymptotic part: High-leverage observations have a negative impact on the asymptotics. Thus, in a non-normal situation, suddenly the distribution of the covariates become (more) relevant. – Michael M Sep 15 '23 at 07:38
  • 1
    I think it's the "it turns out that..." part of the answer that would be most useful to OP (or at least to my reading of the question). Why is it that the coefficient estimates are normally distributed? OP seems to already understand that coefficients are normally distributed, but wants to know how it could be that the CLT applies to them when they are seemingly not sample means, and to their knowledge the CLT is only about sample means. – Noah Sep 24 '23 at 01:23
1

The Central Limit Theorem gives, as is correctly stated in the question, the asymptotic normal distribution of the suitably standardised sample mean. This statement can not only be used to say something about the sample mean. It can also be used to develop mathematical theorems and proofs regarding statistics other than the sample mean. For example, the sample variance looks somewhat similar to the sample in mean the sense that it is also a sum of $n$ random variables, divided by something that is very close to 1/n. It isn't exactly something to which the Central Limit Theorem applies straight away, but it can be rewritten as a function of means of certain statistics that behave like sample means in the Central Limit Theorem, showing that ultimately it will be something asymptotically normally distributed (as shown by the CLT) plus something that vanishes for $n\to\infty$.

For details see Central Limit Theorem for the variance (Math Stackexchange).

Such an approach applies to a far more general class of statistics. This includes the linear regression coefficients (which can also be written as somehow standardised sums) and also quite general maximum likelihood estimators. A further principle to apply the CLT to other statistics than the sample mean that are not of the form "$c(n)$ times sum of something" is that a statistics could be approximated in some way by a mean to which the CLT applies, for example by means of Taylor explansion, and then one could prove that the remainder term vanishes with probability 1 for $n\to\infty$. The issue is always that one needs to prove that what we are interested in can be reduced to a mean of something to which the CLT applies plus other stuff that vanishes for infinite $n$, maybe times something that converges to 1 or to a constant that can be "standardised away".

So in fact the CLT has been used to prove asymptotic normality of many statistics that are not the sample mean itself. (It will not always work but often; sometimes sophisticated mathematics is needed to find it out.)