3

Let's say we have two estimators for $\beta$.

$\beta$ denotes all a full set of coefficients, one for each covariate in a dataframe.

$\beta$ can be split into $\beta_p$ and $\beta_r$, where $p$ denotes a subset of features and $r$ are the removed covariates.

So we estimate two models, $y = X_p \beta_p$ and $y = X\beta$.

We denote $\hat{\beta_p}$ to be our coefficient estimate for the reduced model. $\hat{\beta_p}{\star}$ is our estimate for only the coefficients in the reduced model when estimating the full model.

Therefore:
$Var(\hat{\beta_p}) = \sigma^2(X_p'X_p)^{-1}$
$Var(\hat{\beta}) = \sigma^2(X'X)^{-1}$

My question is, why is it the case that

$Var(\hat{\beta_p}{\star}) - Var(\hat{\beta_p})$ is positive semi-definite? Why is the variance of the reduced estimator lower?

It seems as though because the formulas are the same except the full model has fewer degrees of freedom, that would increase the variance.

But the variances should be estimated based on $E[s^2] = \sigma^2$, which we can calculate based on the $E[\epsilon' \epsilon]$. And I'm not clear how that connects back to the above relationship being positive semi-definite or the reduced model having lower variance.

Another thought is more of a variance components approach, where we assume the variance of the full model $Var(\hat{y})$ is composed of the variance added by the removed regressors and the subset model regressors (r and p) above. Therefore, the subset model, which does not contain the variance added by the removed regressors, has lower variance (but potentially some bias).

Please help me understand the bias-variance trade-off here and how it connects back to estimating $s^2$ for each model

Peter Flom
  • 119,535
  • 36
  • 175
  • 383
  • Strictly speaking, what you denote as variances are in fact conditional variances (i.e. $\mathbb{V}\text{ar}\left(\hat{\beta} \mid X \right) = \sigma^2 \left(X^TX\right)^{-1}$ under homoskedasticity). In any case, for $y = X_1 \beta_1 + X_2 \beta_2 + \epsilon$, your claim boils down to showing that $(X^T_2 M_{X_1} X_2)^{-1} - (X^T_2 X_2)^{-1}$ is positive definite where $M_{X_1} $ is the projection into the orthogonal complement of the column space of $X_1$ – Yashaswi Mohanty Dec 04 '23 at 13:50
  • Also I don’t think your claim is true: https://stats.stackexchange.com/questions/596797/when-does-the-standard-errors-of-ols-estimates-decreases-when-we-have-more-expla – Yashaswi Mohanty Dec 04 '23 at 14:41

0 Answers0