2

This is from the lecture slides of mit 18.650

enter image description here

Here $\sigma^2$ is the variance of the error term ($\epsilon$) in the true model $Y = \beta X + \epsilon$ and $\hat{\beta}$ is the model's estimate of $\beta$

What does the last point in the slide mean? It says $\hat{\beta} \perp \hat{\sigma}^2$. Isn't $\hat{\sigma}^2$ a random number and $\hat{\beta}$ a p-dimensional random vector? What does orthogonality mean here?

2 Answers2

3

It means independence, due to $$\hat\beta=(X'X)^{-1}X'y=(X'X)^{-1}X'(X\beta+u)=\beta+(X'X)^{-1}X'u$$ and $$ \hat\sigma^2=\frac{1}{n-p}y'My $$ with $M=I-X(X'X)^{-1}X'$ and thus, in view of $My=M(X\beta+u)=Mu$ and symmetry and idempotence of $M$, $$ \hat\sigma^2=\frac{1}{n-p}u'M'Mu=\frac{1}{n-p}u'Mu $$ Consider the covariance of the "outer parts" of these expresions, $X'u$ and $Mu$ (this is enough as $\hat\beta$ and $\hat\sigma^2$ are functions of these), $$ E(X'uu'M)=\sigma^2X'M=\sigma^2X'(I-X(X'X)^{-1}X')=\sigma^2(X'-X')=0 $$ Here, I exploit an assumption of spherical errors, $E(uu')=\sigma^2I$. It is not stated explicitly in the screenshot, but the variance of the LSE given relies on it, so that I assume it to be made somewhere in the linked notes.

Since the question assumes multivariate normality of the error terms $u$ (again, this is not explicit in the screenshot, but that the LSE is normally distributed will be a consequence of that assumption which will be stated somewhere in the slides cited), zero covariance implies independence.

This exposition assumes constant regressors, but you could also all do it conditional on $X$.

  • What exactly is multivariate normal? Certainly not $(\hat\sigma^2, \hat\beta)$! Moreover, on the face of it $\hat\sigma^2$ is not a function of $Mu$ alone, so a little clarification of the logic would help. (I believe you are relying on the idempotence of $M.$) – whuber Feb 23 '24 at 18:27
  • 1
    As the LSE is normal, $u$ is multivariate normal (not sure that is necessary in the mathematical sense, but I would bet anything that this is stated somewhere in the lecture slides cited). I added some further detail. – Christoph Hanck Feb 24 '24 at 08:21
  • Thank you :) +1 – figs_and_nuts Feb 25 '24 at 11:27
2

In what follows I'll adopt the notation from the slides linked in the question.
Note that the slides are missing (a symbol that indicates) matrix transpositions.


$\hat{\boldsymbol \beta} \perp \!\!\! \perp \hat \sigma^2$ means that $\hat{\boldsymbol \beta}$ and $\hat \sigma^2$ are independent random variables.

On slide 17 it is stated that

  • the design matrix $\mathbf X$ is deterministic and has full column rank,
  • the error vector $\boldsymbol \varepsilon$ follows a $\mathop{\mathcal N_n}\left(0, \sigma^2 I_n\right)$ distribution.

Since $\mathbf Y = \mathbf X \boldsymbol \beta + \boldsymbol \varepsilon$ (slide 15), we have $\mathbf Y \sim \mathop{\mathcal N_n}\left(\mathbf X \boldsymbol \beta, \sigma^2I_n\right)$.

With $\hat{\boldsymbol \varepsilon} \mathrel{:=} \mathbf Y - \mathbf X \hat{\boldsymbol \beta}$, we can write $$ \begin{pmatrix} \hat{\boldsymbol \beta}\\ \hat{\boldsymbol \varepsilon} \end{pmatrix} = \begin{pmatrix} \left(\mathbf X^\mathsf{T} \mathbf X\right)^{-1} \mathbf X^\mathsf{T} \\ I_n - \mathbf X\left(\mathbf X^\mathsf{T} \mathbf X\right)^{-1} \mathbf X^\mathsf{T} \end{pmatrix} \mathbf Y, $$

and observe

$$ \begin{align} \mathop{\text{Cov}}\left(\hat{\boldsymbol \beta}, \hat{\boldsymbol \varepsilon}\right) &= \left(\mathbf X^\mathsf{T} \mathbf X\right)^{-1} \mathbf X^\mathsf{T}\mathop{\text{Cov}}\left(\mathbf Y,\mathbf Y\right) \left(I_n - \mathbf X\left(\mathbf X^\mathsf{T} \mathbf X\right)^{-1} \mathbf X^\mathsf{T}\right)^\mathsf{T} \\ &= \left(\mathbf X^\mathsf{T} \mathbf X\right)^{-1} \mathbf X^\mathsf{T} \sigma^2 I_n \left(I_n - \mathbf X\left(\mathbf X^\mathsf{T} \mathbf X\right)^{-1} \mathbf X^\mathsf{T}\right)\\ &= \mathbf 0. \end{align} $$

Thus, $\hat{\boldsymbol \beta}$ and $\hat{\boldsymbol \varepsilon}$ are jointly normal and uncorrelated, hence independent.
Since $\hat \sigma^2 = \frac{1}{n-p} \hat{\boldsymbol \varepsilon}^\mathsf{T} \hat{\boldsymbol \varepsilon}$ can be viewed as (Borel measurable) function of $\hat{\boldsymbol \varepsilon}$, we can conclude that $\hat{\boldsymbol \beta}$ and $\hat \sigma^2$ are independent.

statmerkur
  • 5,950