Excerpt from "Elements of Statistical Learning", p.47
Assume that the conditional expectation of $Y$ is linear in $X_1, \ldots, X_p$. Also assume that the deviations of $Y$ around its expectation are additive and Gaussian. Hence $$Y = E(Y \mid X_1, \ldots, X_p) + \varepsilon,$$ $$ = \beta_0 + \sum_{j = 1}^p X_j \beta_j + \varepsilon,$$ where the error $\varepsilon \sim N(0,\sigma^2)$ is a Gaussian random variable.
It is then easy to show that $\hat \beta \sim N(\beta, (X^t X)^{-1} \sigma^2)$ (1) and that $(N - p - 1)\hat \sigma^2 \sim \sigma^2 \chi^2_{N-p-1}$ (2).
Earlier they also assume that the observations $y_i$ are uncorrelated and have constant variance $\sigma^2$, and the $x_i$ are fixed.
Question
This part of ESL is repetition of basics that I'm trying to rehash since it was a long time ago that I studied this stuff.
For (1) I can show that $\mathrm{Cov}(\hat \beta) = (X^t X)^{-1} \sigma^2$ and of course $E(\hat \beta) = \beta$, but don't I need to know that $Y$ is normally distributed to be able to tell the distribution of $\hat \beta \sim N(E(\hat \beta), \mathrm{Cov}(\hat \beta))$?
For (2) I know that the hat matrix $X(X^t X)^{-1}X^t$ is idempotent, has rank $p + 1$ and $X^t \varepsilon = 0$. How can I finish (2)?
As for the second question I assume it has something to do with $X(X^tX)^{-1}X^t$ being a projection matrix hence splitting $N$ into the projection which has dimension $p + 1$ and the rest which is of dimension $N - (p + 1)$, however I can't seem to hammer down the details.
– Lejoon Jan 08 '23 at 20:37