2

I'm not sure if this technique has a name.

I've recently learned that some people perform a two-step regression where, in the second step, they regress the residuals from the first step on some new variables. So, starting with a standard OLS:

$y = \beta_0 + \beta_1 x_1 + ... + \beta_m x_m + e$

We could now use the residuals $e$ as a dependent variable regressed on one or more new independent variables:

$e = \alpha_0 + \alpha_1 z_1 + ... + \alpha_n z_n + u$

I believe the goal here is to somehow control for $x$ in regressing on $z$, but I don't understand the benefit of this over a single regression of $y$ on $x$ and $z$ simultaneously.

When is this two-step procedure preferable to a single regression equation?

  • 3
    It is variously called "controlling," "matching," "leaving out," and various other things. See https://stats.stackexchange.com/a/46508/919 for one account. AFAIK, the benefits are primarily conceptual because good numerical procedures rely on various matrix decompositions (e.g. Cholesky) rather than this sequential approach. – whuber Dec 17 '19 at 15:34
  • 1
    AFAIK "residual regression" leads to biased estimates: https://besjournals.onlinelibrary.wiley.com/doi/full/10.1046/j.1365-2656.2002.00618.x – Carsten Dec 17 '19 at 16:33
  • This finds several or more relevant papers. I'd read some of these papers, some of the works the papers cite, and the works that cite some of these papers. https://scholar.google.com/scholar?hl=en&as_sdt=0%2C7&q=regression+using+residuals+as+dependent+variable&btnG= – TC1 Dec 06 '22 at 15:40

0 Answers0