1

Suppose that we are interested in the following model: $$y_i=\beta_1+\beta_2x_{i2}+\beta_3x_{i3}+u_i$$

Here, there is a dummy variable $d_i$.

I am wondering whether the following estimators are equivalent:

[OLS using only the observations with $d_i=1$] versus [OLS $d_iy_i$ on $d_i1,\;d_ix_{i2},\; d_ix_{i2}$]

That is, using the subset versus using the dummy-interacted variables.

When I run the both cases, the former yields a consistent estimator but the latter does not.

(even though the values of the dummy is random assignment)

the example R code is as follows:

x2 <- rnorm(100000, 2, 1)
x3 <- rnorm(100000, 1.5, 1)
x4 <- rbinom(100000, 1, 0.5)
y <- 1+2*x2+2*x3+rnorm(10000)
dt <- data.frame(y = y, x2 = x2, x3 = x3)
est <- lm(y~x2+x3, data= dt, subset = (x4 == 1))
summary(est)
nobs(est)

dt4 <- data.frame(y = yx4, x2 = x2x4, x3 = x3*x4, x4=x4) est4 <- lm(y~x2+x3, data= dt4) summary(est4) nobs(est4)

Is there a way to do the same estimation without throwing away some of the data?

Why do they have different results?

Why the latter yields worse results despite the large number of observations?

M.C. Park
  • 925
  • 1
    Please provide a reproducible example so that we can see what's going on. Please do that via the code {} tool on the toolbar. – EdM May 13 '22 at 13:49
  • Sorry about the omitted part. I added the code! – M.C. Park May 13 '22 at 13:57
  • Why would you expect the two approaches return similar results? If you look at what dt4 actually contains, you will find a number of rows filled with zeros. That is what is causing the difference. – Richard Hardy May 13 '22 at 14:00
  • @RichardHardy The reason is that in a textbook, I found a sentence that seems to imply the equivalence of the two estimations. But, I also think that there is no specific reason for that. – M.C. Park May 13 '22 at 15:32
  • I see. The textbook may contain a poorly formulated statement, or you might have misread it. – Richard Hardy May 13 '22 at 15:40
  • There are many similar questions here, see the links at https://stats.stackexchange.com/questions/574854/separating-datasets-vs-one-dataset-with-extra-categorical-feature#comment1061323_574854 – kjetil b halvorsen May 13 '22 at 15:47

2 Answers2

1

The way that you structured df4 you effectively only included interaction (product) terms between the binary x4 and the original x2 and x3 predictors, while omitting the "main effects." That's generally poor practice, except in very limited circumstances. See this page for extensive discussion. If you structure that regression properly, e.g.:

est2 <- lm(y~(x2+x3)*x4, data= dt)

then all will make sense.

EdM
  • 92,183
  • 10
  • 92
  • 267
0

The answer: The two regression should be the same because $$[\sum_{i:d_i=1} x_ix_i']^{-1}[\sum_{i:d_i=1}x_iy_i]$$ using only the subsample with $d_i=1$ is equivalent to $$[\sum_{i=1}^N x_id_ix_i']^{-1}[\sum_{i=1}^Nx_id_iy_i]$$. The reason for the different results from the R code above is that the lm function do not make the interaction in the "intercept term".

see the following code and results is the same

one <- rep(1, 1000)
x1 <- rnorm(1000)
x2 <- rnorm(1000)
d <- sample(c(1,0), 1000,replace = T)
u <- rnorm(1000)
y <- one+2*x1+x2+u

using subsample

dt <- data.frame(y=y,x1=x1,x2=x2,d=d) est1 <- lm(y~x1+x2, data=dt, subset = (d==1))

using interaction

X <- d*cbind(one,x1,x2)

results

est1$coefficients solve(t(X)%%X)%%(t(X)%%y) > est1$coefficients (Intercept) x1 x2 1.0077684 2.0358979 0.9949592 > solve(t(X)%%X)%%(t(X)%%y) [,1] one 0.9547948 x1 2.0125214 x2 0.9677016 ````

M.C. Park
  • 925