3

My question is related to [1], [2] and [3].

Assume we estimate a multiple regression,

$$ y = a + b_1x_1 + b_2x_2 + u $$

and are mainly interested in the value of $\hat{b}_1$ (lets denote this specific estimate $\hat{b}_{1; \text{model 1}}$).

If we run a different model by including an additional independent variable $x_3$

$$ y = a + b_1x_1 + b_2x_2 + b_3x_3 + u $$

we will observe a different estimate $\hat{b}_1$ (denoted as $\hat{b}_{1; \text{model 2}}$), because the answer in [1] states that

A parameter estimate in a regression model will change if a variable is added to the model that is:

  1. correlated with that parameter's corresponding variable (which was already in the model), and
  2. correlated with the response variable

Question:

Does there exist a closed formula for the change in the estimated coefficient $\hat{b}_1$ when including additional independent variables?


Edit: Assume we just include one additional indep. variable $x_3$ where all observations are known. Of course, one could run both regressions in that case, but does there exist a way to directly calculate the change in the estimated $\hat{b}_1$?

  • 1
    The answer is yes, but it depends on the correlations of the new variable with all the previous regressors and the response. This is explained in many ways in many threads on multiple regression here, such as https://stats.stackexchange.com/questions/17336, https://stats.stackexchange.com/questions/21022, and https://stats.stackexchange.com/a/166718/919. In the latter I provide the formulas and the procedure. – whuber Aug 09 '22 at 12:05

3 Answers3

5

By the Frisch–Waugh(–Lovell) theorem and the well-know formula $\left(X^{\top}X \right)^{-1}X^{\top}y$ for the OLS estimator we have $$ \begin{align} \hat{b}_{1; \text{model 1}}&=\left( \left(M_{2}x_1 \right)^{\top}M_{2}x_1 \right)^{-1} \left(M_{2}x_1 \right)^{\top}M_{2}y =\left( x_1^{\top}M_{2}x_1 \right)^{-1} x_1^{\top}M_{2}y,\\ \hat{b}_{1; \text{model 2}}&=\left( \left(M_{2,3}x_1 \right)^{\top}M_{2,3}x_1 \right)^{-1} \left(M_{2,3}x_1 \right)^{\top}M_{2,3}y =\left( x_1^{\top}M_{2,3}x_1 \right)^{-1} x_1^{\top}M_{2,3}y, \end{align} $$ with symmetric and idempotent matrices $$ \begin{align} M_{2}&=I-\left( \mathbf{1}\,x_2 \right) \left( \left( \mathbf{1}\,x_2 \right)^{\top} \left( \mathbf{1}\,x_2 \right) \right)^{-1} \left( \mathbf{1}\,x_2 \right)^{\top},\\ M_{2,3}&=I-\left( \mathbf{1}\,x_2\; x_3 \right) \left( \left( \mathbf{1}\,x_2\; x_3 \right)^{\top}\left( \mathbf{1}\,x_2\; x_3 \right) \right)^{-1} \left( \mathbf{1}\,x_2\; x_3 \right)^{\top}. \end{align} $$ Hence, the change in the OLS estimate of $b_1$ is given by
$$ \hat{b}_{1; \text{model 2}}-\hat{b}_{1; \text{model 1}}=\left( \left( x_1^{\top}M_{2,3}x_1 \right)^{-1} x_1^{\top}M_{2,3}-\left( x_1^{\top}M_{2}x_1 \right)^{-1} x_1^{\top}M_{2} \right)y $$ or simply by $$ \left( x_1^{\top}M_{2,3}x_1 \right)^{-1} x_1^{\top}M_{2,3}y - \hat{b}_{1; \text{model 1}} $$ if you already know the value of $\hat{b}_{1; \text{model 1}}$.

statmerkur
  • 5,950
0

No, there is no closed form for the the change in the estimated coefficient $\hat{b}_1$ when including additional independent variables. Why? When adding other independent variables, say $x_4, x_5, ...$, there will be some effects to other existing variables, say $x_2$. We cannot predict how it will change and how much it will affect. Here, one may face multicollinearity problem, which can be detected and solved using several methods such as variance inflation factor, principal component regression, and so on.

RRMT
  • 362
  • Does this answer hold even if we know all values of the (single) additional variable $x_3$ (of course, one could then run both regressions and directly observe the change in the estimated $b_1$, but does there exist a formula for that specific case)? – skoestlmeier Aug 09 '22 at 08:41
  • 2
    Although you cannot predict the changes, you can calculate them. That gives a formula, albeit a somewhat complicated one. It's still computationally more efficient than redoing the regression de novo and it permits analysis of omitted variable biases. – whuber Aug 09 '22 at 12:55
0

We can find both $\hat{b}_{1, \text{model } 1}$ and $\hat{b}_{1, \text{model } 2}$ are closed form (take the appropriate element of $(X^T X)^{-1}X y$) thus the difference between them is also available in closed form.

I do not think we can, in general, offer a simpler or more elegant solution than computing their differences because matrix inversion is a complex operation. Nonetheless, it is a closed form solution.

jcken
  • 2,907
  • 1
    This pessimistic outlook is incorrect: see the answer by @statmerkur here or the comments to the question. – whuber Aug 09 '22 at 12:54