2

Suppose I do linear regression for data $y \in \mathbb{R}^n$ and design matrix $X \in \mathbb{R}^{n \times m}$, with $n \gg m$. I seek $$ \hat{\beta} = \operatorname*{argmin}_{\beta \in \mathbb{R}^m} \| X\beta - y \|_2. $$

What are the ways to quantify uncertainty in $\hat{\beta}$? I considered bootstrap and maybe a Bayesian estimator that may give a prior with closed form expression for the variance of $\beta$. What are other approaches?

References are appreciated but a full derivation (with intuition) would be ideal.

Full derivation: Proof that the coefficients in an OLS model follow a t-distribution with (n-k) degrees of freedom

Yair Daon
  • 2,484
  • 1
  • 18
  • 31
  • @moreblue I would really appreciate it if you could elaborate... – Yair Daon Apr 27 '19 at 23:44
  • 2
    Say the model is $y = X\beta + \epsilon$, and you assume $var(\epsilon) =\sigma^2 I_n$. Then the estimated variance of $\beta^\ast$ is $\hat{\sigma}^2(X^TX)^{-1}$ where $\hat{\sigma}^2$ is the RSS devided by its degree of freedom (which is $n$ minus the number of coefficients) (Edited, because you do not need any distribution assumptions for $\epsilon$) – moreblue Apr 27 '19 at 23:47
  • 1
  • where $\mathbb{E}(\epsilon)=0$...
  • – moreblue Apr 27 '19 at 23:55
  • 1
    There are standard ways of finding a confidence region for $\widehat\beta$ (which you called $\beta^\star$) based on the facts that $(1)$ $\quad \widehat\beta \sim N_m(\beta, \sigma^2(X^\top X)^{-1})$ and $(2)$ $ \quad | \widehat {\varepsilon,} |^2 / \sigma^2 \sim \chi^2_{n-m}$ and $(3)$ $\quad \widehat \beta$ and $\widehat {\varepsilon,}$ are independent of each other. $\qquad$ – Michael Hardy Apr 28 '19 at 02:34
  • what happen if $X=X'+\epsilon'$ with $Var(\epsilon')=\sigma'^2$ ? – Boris Valderrama Mar 13 '20 at 01:49