In a nutshell, I want the regression coefficients of a model to match several differences in conditional means.
You can download the data from this repo.
I have a data set that has a dependent variable (Y) and three binary columns (T, X1 and X2).
| Y | CONST | T | X1 | X1T | X2 | X2T | |
|---|---|---|---|---|---|---|---|
| 0 | 2.31252 | 1 | 1 | 0 | 0 | 1 | 1 |
| 1 | -0.836074 | 1 | 1 | 1 | 1 | 1 | 1 |
| 2 | -0.797183 | 1 | 0 | 0 | 0 | 1 | 0 |
I want to calculate the difference in the mean of Y for observations with T == 1 and those with T == 0 for each of the four possible combinations of X1 and X2:
- Mean difference given
X1 == 0andX2 == 0 - Mean difference given
X1 == 0andX2 == 1 - Mean difference given
X1 == 1andX2 == 0 - Mean difference given
X1 == 1andX2 == 1
I did this manually, but I cannot get the following model to match my results: $$Y = \beta_0 + \beta_1 T + \beta_2 X_1 + \beta_3 X_1 T + \beta_4 X_2 + \beta_5 X_2 T + U$$
As per this post:
- $\hat{\beta_1}$ should match case 1
- $\hat{\beta_1} + \hat{\beta_5}$ should match case 2
- $\hat{\beta_1} + \hat{\beta_3}$ should match case 3
- $\hat{\beta_1} + \hat{\beta_3} + \hat{\beta_5}$ should match case 4
As can be seen in this jupyter notebook, I cannot get these two methods to match.
How come the linear regression results do not match the differences in conditional means?
