Model without interaction terms: how wrong is it?

Question

Suppose we have the following DGP:

$ y = 10 x_1 + 20 x_2 + 30 x_1 x_2 $

Suppose we sample from the population, and we estimate the following models:

$ y = \alpha_1 x_1 + \beta_1 x_2 $

$ y = \alpha_2 x_1 + \beta_2 x_2 + \gamma_2 x_1 x_2$

We choose OLS as our estimator, obtaining the estimates $\hat{\alpha}_1, \hat{\beta}_1, \hat{\alpha}_2, \hat{\beta}_2, \hat{\gamma}_2 $.

We would expect $ \hat{\alpha}_2 \approx 10 $, $ \hat{\beta}_2 \approx 20 $, $ \hat{\gamma}_2 \approx 30 $, since the second model is correctly specified.

But what about the first model? What about $ \hat{\alpha}_1 $ and $ \hat{\beta}_1 $ ?

Is there a relationship between $ \hat{\alpha}_1 $, $ \hat{\beta}_1 $ and the true coefficients?

I ask because in research papers, including one I'm reviewing, you frequently estimate the model without interaction terms, and then you add the interaction terms, that is, you estimate another model with the interaction terms, and you put it side-by-side to the one without interaction terms to compare the two.

It happens that the "main" terms for example lose significance, and what are significant are the interaction terms.

But if the model is incorrectly specified, I don't know what the estimates for the model without interaction term means.

I tried to simulate my example with the following code:

df <- data.frame(id=1:N)
df$x1 <- rnorm(N, 30, 30/3)
df$x2 <- rnorm(N, 15, 15/3)
df$e <- rnorm(N, 0, 0.001)
df$y <- 10*df$x1 + 20*df$x2 + 30*df$x1*df$x2 + df$e
Ns <- 10**3
dfs <- df[sample(N, Ns),]
m1 <- lm(y ~ x1 + x2, data=dfs)
summary(m1)
m2 <- lm(y ~ x1*x2, data=dfs)
summary(m2)

While I obtain 10, 20 and 30 as the estimates for the second model, in the first model I obtain $458.332$ as the coefficient for $x_1$, and $925.557$ as the coefficient for $x_2$, which are completely off, and both of them are significant.

There isn't anything special about an interaction here: it's the same question if you just posit there's a third regressor $x_3$ playing the role of $x_1x_2.$ In this generality we have many good posts about the issues you raise: see this site search. The short answer, which you might guess from this generalization, is that differences between the two sets of coefficients depend on the (second order) relationships between $x_3$ and all the original variables, $x_1,$ $x_2,$ and $y.$ — whuber, Oct 17 '23 at 19:03

Christoph Hanck · Answer 1 · 2023-10-18T11:21:03.747

As @whuber has pointed out, this is an omitted variable story. Linear projection results tell us that, since we include an intercept in our regression, the OLS slope coefficients will, letting $x=(x_1\quad x_2)'$, tend to $$ \text{plim}\,\begin{pmatrix}\hat\alpha_1\\\hat\beta_1\end{pmatrix}=Var(x)^{-1}Cov(x,y) $$ Since you generate regressors to be independent, we have, letting $V_j=Var(x_j)$, $$ Var(x)^{-1}=\frac{1}{V_2V_1}\begin{pmatrix}V_2&0\\0&V_1\end{pmatrix} $$ and (deriving the plim for $\hat\beta_1$ will be analogous) $$ \begin{eqnarray*} Cov(x_1,y)&=&Cov(x_1,\alpha_2x_1+\beta_2x_2+\gamma_2x_1x_2+e)\\ &=&\alpha_2V_1+\gamma_2Cov(x_1,x_1x_2)\\ \end{eqnarray*} $$ From cov(x,x*y) Covariance of two normally distributed variables, using independence, $$ \begin{eqnarray*} Cov(x_1,x_1x_2)&=&E(x_1^2)E(x_2)-E(x_1)E(x_1x_2)\\&=&(V_1+E(x_1)^2)E(x_2)-E(x_1)^2E(x_2)\\&=&V_1E(x_2) \end{eqnarray*}$$ Putting things together gives $$ \begin{eqnarray*} \text{plim}\,\hat\alpha_1&=&\frac{1}{V_1}[\alpha_2V_1+\gamma_2V_1E(x_2)]\\&=& \alpha_2+\gamma_2E(x_2) \end{eqnarray*} $$ Rewriting your code a little and given your numbers, we find this to be 460, which estimated values from your simulated code will hover around:

N <- 1000000
E1 <- 30
E2 <- 15
V1 <- (30/3)^2
V2 <- (15/3)^2
alpha2 <- 10
beta2 <- 20
gamma2 <- 30
df <- data.frame(id=1:N)
df$x1 <- rnorm(N, E1, sqrt(V1))
df$x2 <- rnorm(N, E2, sqrt(V2))
df$e <- rnorm(N, 0, 0.001)
df$y <- alpha2df$x1 + beta2df$x2 + gamma2df$x1df$x2 + df$e
Ns <- 10**3
dfs <- df[sample(N, Ns),]
m1 <- lm(y ~ x1 + x2, data=dfs)
summary(m1) # coefficient on x1 close to 460
m2 <- lm(y ~ x1*x2, data=dfs)
summary(m2)
alpha2 + gamma2*E2 # 460

+1 The answer seems to suggest that much less damage is done by not estimating the interaction if the $x_i$ are standardised. However this only holds if the true interaction is also governed by standardised (or at least centered) "versions" of $x_1, x_2$. Chances are that if we use models such as in the first equation of the question, it makes more sense to think of the interaction in terms of standardised or centered variables, also for interpretation of what the interaction and regression coefficients mean. — Christian Hennig, Oct 18 '23 at 09:21
Thanks, that is indeed an interesting consequence of the result! — Christoph Hanck, Oct 18 '23 at 11:04
A small part of the issue is that even though fitting a submodel is a completely valid way to test for importance of the omitted model components, it is not necessary in the case of linear models, since all possible test statistics can be computed using contrasts within the full model. Coefficients of submodels are not really of interest unless one wishes to use the submodel by itself, i.e., one wishes to condition on the omitted terms being exactly zero. And since interaction terms are very hard to estimate precisely, especially when the interacting factors are collinear, consider Bayes — Frank Harrell, Oct 18 '23 at 12:15

Model without interaction terms: how wrong is it?

1 Answers1

Linked