5

The image is a copied and pasted youtube lecture on Linear Regression. I can sort of understand what the lecturer says during the lecture, but I wonder how I actually calculate the $\sigma^2$ in the read box of the image. $\underline{X}$ means a matrix. The linear regression is homoscadastic with no serial correlation.

enter image description here

user122358
  • 1,673
  • 4
  • 19
  • 31
  • Could you please elaborate on what you mean by "get the $\sigma^2$"? Mathematically, it's there because it was present in the original expression for $\operatorname{Var}(y)$, so there doesn't seem to be anything to explain. – whuber Mar 11 '17 at 19:41
  • 1
    You cannot actually calculate $\sigma^2$, because it's a model parameter--you don't know it. You can estimate it in terms of the residuals, but that's not what's going on in the figure you reproduced. – whuber Mar 11 '17 at 19:53

1 Answers1

9

Let lowercase bold letters denote vectors. The linear model is:

$$ y_i = \mathbf{x}_i' \mathbf{b} + \epsilon_i $$

In matrix notation for all observations $i=1, \ldots, n$:

$$\mathbf{y} = X \mathbf{b} + \boldsymbol{\epsilon} $$

The OLS estimator for $\mathbf{b}$ is:

$$\hat{\mathbf{b}} = (X'X)^{-1}X'\mathbf{y} $$

Substituting:

$$\hat{\mathbf{b}} = (X'X)^{-1}X' \left( X \mathbf{b} + \boldsymbol{\epsilon}\right) $$

What's the variance of our estimator $\hat{\mathbf{b}}$? Take the variance (conditional on X)$:

\begin{align*} \operatorname{Var}\left( \hat{\mathbf{b}} \mid X\right) &= \operatorname{Var}\left( (X'X)^{-1}X' \left( X \mathbf{b} + \boldsymbol{\epsilon}\right) \mid X\right) \\ &= \operatorname{Var}\left( (X'X)^{-1}X' \boldsymbol{\epsilon} \mid X\right) \\ &= (X'X)^{-1}X' \operatorname{Var}\left( \boldsymbol{\epsilon} \mid X\right) X(X'X)^{-1} \end{align*}

We assumed that that $\epsilon_i$ were IID. Hence for all $i$ we have $\operatorname{Var}(\epsilon_i) = \sigma^2$, and for the whole vector $\boldsymbol{\epsilon}$ we have $\operatorname{Var}(\boldsymbol{\epsilon}) = \sigma^2 I$ where $I$ is the identity matrix.

Continuing: \begin{align*} \operatorname{Var}\left( \hat{\mathbf{b}} \mid X\right) &= (X'X)^{-1}X' \sigma^2 IX(X'X)^{-1} \\ &= \sigma^2 (X'X)^{-1}X'X(X'X)^{-1} \\ &= \sigma^2 (X'X)^{-1} \end{align*}

$\sigma^2$ is a scalar and can be moved wherever by the commutative property of multiplication. $(X'X)^{-1}X'X = I$ by definition of an inverse matrix.

You can estimate $\sigma^2$ using the residuals from your OLS regression. Let $\hat{e}_i$ be the residual (as opposed to error-term $\epsilon_i$) for observation $i$. That is:

$$ \hat{e}_i = y_i - \mathbf{x}_i' \hat{\mathbf{b}} $$ Note that this is based upon estimate $\hat{\mathbf{b}}$ instead of true values $\mathbf{b}$. Then the usual estimator for $\sigma^2$ is: $$\hat{\sigma}^2 = \frac{1}{n-k} \sum_i \hat{e}_i^2$$ where $k$ is your number of regressors (including a constant).

Matthew Gunn
  • 22,329
  • Thank you for your explanation and I think I understand now. – user122358 Mar 11 '17 at 20:09
  • How would the last part (of estimating sigma) work for ridge regression, assuming k > n? – runr Jun 12 '20 at 03:26
  • Thanks a lot for your answer! Could you please elaborate on how you made this step? Var((′)^(−1)′(+)∣) = Var((′)^(−1)′∣) – Антон Бугаев Dec 28 '21 at 06:35
  • @ Антон Бугаев $X \mathbf{b}$ isn't a random vector: it's a vector of scalars so its variance is 0. It's the same conceptual idea that $\operatorname{Var}(2 + Z) = \operatorname{Var}(Z)$ (where $Z$ is a random variable). – Matthew Gunn Dec 29 '21 at 03:08