0

Assume that we have the following DGP: $$ y=\beta_{1}+\beta_{2}X+\epsilon $$ where $X=\{0,1\}$ is an indicator variable. The OLS estimator in this case is easy to compute: $$ \hat{\beta}_{OLS}=E\left[y|X=1\right]-E\left[y|X=0\right] $$ Now assume that we have another continuous variable $Z.$ Write down the DGP as $$ y=\beta_{1}+\beta_{3}X+\beta_{4}Z+\epsilon $$

In this case, can we write down the OLS estimator as: $$ \hat{\beta}_{OLS}=E[E\left[y|X=1\right]-E\left[y|X=0\right]|Z] $$

where we take the difference in conditional means of $y$ for $X=1$ and $X=0$ for each value of $Z,$ and then average over it?

  • 1
    If $Z$ is continuous random variable, you may not have any observations with the same $Z$ and different $X$ so it is not clear what your final expression would then mean – Henry Aug 21 '23 at 19:50
  • 1
    Regress $(X,y)$ against $Z$ and then apply your first formula to the residuals. See https://stats.stackexchange.com/a/46508/919 for further explanation and software. – whuber Aug 21 '23 at 20:03
  • @Henry Of course, in which case their contribution to the estimate would be precisely 0. The only contributors to the slope coefficient would consist of those points where variation in X conditional on Z exists.. – Kwame Brown Aug 24 '23 at 15:12
  • @Kwame Usually, like with your DGP where it's a simple constant slope, we make assumption about how $Z$ influences $Y$ (potentiality in interaction with $X$) and then use these assumptions to interpolate/extrapolate the missing matches. – Lukas Lohse Sep 13 '23 at 14:25

0 Answers0