Let's say we have two estimators for $\beta$.
$\beta$ denotes all a full set of coefficients, one for each covariate in a dataframe.
$\beta$ can be split into $\beta_p$ and $\beta_r$, where $p$ denotes a subset of features and $r$ are the removed covariates.
So we estimate two models, $y = X_p \beta_p$ and $y = X\beta$.
We denote $\hat{\beta_p}$ to be our coefficient estimate for the reduced model. $\hat{\beta_p}{\star}$ is our estimate for only the coefficients in the reduced model when estimating the full model.
Therefore:
$Var(\hat{\beta_p}) = \sigma^2(X_p'X_p)^{-1}$
$Var(\hat{\beta}) = \sigma^2(X'X)^{-1}$
My question is, why is it the case that
$Var(\hat{\beta_p}{\star}) - Var(\hat{\beta_p})$ is positive semi-definite? Why is the variance of the reduced estimator lower?
It seems as though because the formulas are the same except the full model has fewer degrees of freedom, that would increase the variance.
But the variances should be estimated based on $E[s^2] = \sigma^2$, which we can calculate based on the $E[\epsilon' \epsilon]$. And I'm not clear how that connects back to the above relationship being positive semi-definite or the reduced model having lower variance.
Another thought is more of a variance components approach, where we assume the variance of the full model $Var(\hat{y})$ is composed of the variance added by the removed regressors and the subset model regressors (r and p) above. Therefore, the subset model, which does not contain the variance added by the removed regressors, has lower variance (but potentially some bias).
Please help me understand the bias-variance trade-off here and how it connects back to estimating $s^2$ for each model