The image is a copied and pasted youtube lecture on Linear Regression. I can sort of understand what the lecturer says during the lecture, but I wonder how I actually calculate the $\sigma^2$ in the read box of the image. $\underline{X}$ means a matrix. The linear regression is homoscadastic with no serial correlation.
- 1,673
- 4
- 19
- 31
-
Could you please elaborate on what you mean by "get the $\sigma^2$"? Mathematically, it's there because it was present in the original expression for $\operatorname{Var}(y)$, so there doesn't seem to be anything to explain. – whuber Mar 11 '17 at 19:41
-
1You cannot actually calculate $\sigma^2$, because it's a model parameter--you don't know it. You can estimate it in terms of the residuals, but that's not what's going on in the figure you reproduced. – whuber Mar 11 '17 at 19:53
1 Answers
Let lowercase bold letters denote vectors. The linear model is:
$$ y_i = \mathbf{x}_i' \mathbf{b} + \epsilon_i $$
In matrix notation for all observations $i=1, \ldots, n$:
$$\mathbf{y} = X \mathbf{b} + \boldsymbol{\epsilon} $$
The OLS estimator for $\mathbf{b}$ is:
$$\hat{\mathbf{b}} = (X'X)^{-1}X'\mathbf{y} $$
Substituting:
$$\hat{\mathbf{b}} = (X'X)^{-1}X' \left( X \mathbf{b} + \boldsymbol{\epsilon}\right) $$
What's the variance of our estimator $\hat{\mathbf{b}}$? Take the variance (conditional on X)$:
\begin{align*} \operatorname{Var}\left( \hat{\mathbf{b}} \mid X\right) &= \operatorname{Var}\left( (X'X)^{-1}X' \left( X \mathbf{b} + \boldsymbol{\epsilon}\right) \mid X\right) \\ &= \operatorname{Var}\left( (X'X)^{-1}X' \boldsymbol{\epsilon} \mid X\right) \\ &= (X'X)^{-1}X' \operatorname{Var}\left( \boldsymbol{\epsilon} \mid X\right) X(X'X)^{-1} \end{align*}
We assumed that that $\epsilon_i$ were IID. Hence for all $i$ we have $\operatorname{Var}(\epsilon_i) = \sigma^2$, and for the whole vector $\boldsymbol{\epsilon}$ we have $\operatorname{Var}(\boldsymbol{\epsilon}) = \sigma^2 I$ where $I$ is the identity matrix.
Continuing: \begin{align*} \operatorname{Var}\left( \hat{\mathbf{b}} \mid X\right) &= (X'X)^{-1}X' \sigma^2 IX(X'X)^{-1} \\ &= \sigma^2 (X'X)^{-1}X'X(X'X)^{-1} \\ &= \sigma^2 (X'X)^{-1} \end{align*}
$\sigma^2$ is a scalar and can be moved wherever by the commutative property of multiplication. $(X'X)^{-1}X'X = I$ by definition of an inverse matrix.
You can estimate $\sigma^2$ using the residuals from your OLS regression. Let $\hat{e}_i$ be the residual (as opposed to error-term $\epsilon_i$) for observation $i$. That is:
$$ \hat{e}_i = y_i - \mathbf{x}_i' \hat{\mathbf{b}} $$ Note that this is based upon estimate $\hat{\mathbf{b}}$ instead of true values $\mathbf{b}$. Then the usual estimator for $\sigma^2$ is: $$\hat{\sigma}^2 = \frac{1}{n-k} \sum_i \hat{e}_i^2$$ where $k$ is your number of regressors (including a constant).
- 22,329
-
-
How would the last part (of estimating sigma) work for ridge regression, assuming k > n? – runr Jun 12 '20 at 03:26
-
Thanks a lot for your answer! Could you please elaborate on how you made this step? Var((′)^(−1)′(+)∣) = Var((′)^(−1)′∣) – Антон Бугаев Dec 28 '21 at 06:35
-
@ Антон Бугаев $X \mathbf{b}$ isn't a random vector: it's a vector of scalars so its variance is 0. It's the same conceptual idea that $\operatorname{Var}(2 + Z) = \operatorname{Var}(Z)$ (where $Z$ is a random variable). – Matthew Gunn Dec 29 '21 at 03:08
