3

Quite a basic multiple linear regression question.

I know that the intercept should describe the predicted value when all predictors equal 0. Then, why does adding a paired interaction term (X1X2) changes the intercept? Wouldn't the product term be zero when the predictors are zero?

Thanks for your help.

  • 1
    It's because changing the function that you use to make a fit, will change the fit in the point when all predictors equal 0. Related: https://stats.stackexchange.com/questions/439905/ – Sextus Empiricus Jan 19 '23 at 08:48

1 Answers1

8

Consider a simple case with only a single numerical predictor and two groups. You are right that the intercept gives the fitted value when all predictors are zero, that is, when the numerical predictor has a value of zero and the group is the reference group (assuming standard dummy coding for factor variables).

If you plot your fits, the intercept then corresponds to the point where the fit for the reference group, plotted against the predictor, intersects the vertical axis. However, here is the key point: when you add an interaction to your model, you change the slopes and the intercepts of both fitting lines. Thus the $y$ axis intercepts of the two (now non-parallel) lines will change - and the "model" intercept, i.e., the constant term, will do so too, since it is just one of the two $y$ axis intercepts.

plot

In the plot above, the intercept for the model plotted on the left is 0.3370 and for the one on the right, it is -0.0019. In both cases, it is the $y$ coordinate where the black dashed line intersects the $y$ axis.

R code:

set.seed(1)
nn <- 100
group <- sample(c("A","B"),size=nn,replace=TRUE)
predictor <- runif(nn)
dataset <- data.frame(group,predictor,
    outcome=predictor-1.4*predictor*(group=="B")+0.5*(group=="B")+rnorm(nn,0,0.2))
xx <- seq(0,1,0.01) # for plotting

opar <- par(mfrow=c(1,2),mai=c(1,1,.7,.1)) with(dataset,plot(predictor,outcome,col=(group=="B")+1,pch=19,las=1,main="Main effects only")) legend("topleft",pch=19,col=1:2,legend=c("A","B")) (model_1 <- lm(outcome~predictor+group,dataset)) lines(xx,predict(model_1,newdata=data.frame(group="A",predictor=xx)),col=1,lwd=2,lty=2) lines(xx,predict(model_1,newdata=data.frame(group="B",predictor=xx)),col=2,lwd=2,lty=2)

with(dataset,plot(predictor,outcome,col=(group=="B")+1,pch=19,las=1,main="Model with interaction")) legend("topleft",pch=19,col=1:2,legend=c("A","B")) (model_2 <- lm(outcome~group*predictor,dataset)) lines(xx,predict(model_2,newdata=data.frame(group="A",predictor=xx)),col=1,lwd=2,lty=2) lines(xx,predict(model_2,newdata=data.frame(group="B",predictor=xx)),col=2,lwd=2,lty=2) par(opar)

Stephan Kolassa
  • 123,354
  • Thanks for your answer. How would you explain this in the case of two numerical predictors? – Ido Ben Artzi Jan 19 '23 at 09:52
  • In exactly the same way. Mathematically, when you fit a larger model, all parameter estimates change to ensure the smallest in-sample sum of squared errors. When you compare a fit $y\sim b_0+b_xx+b_zz$ to $y\sim b_0'+b_x'x+b_z'z+b_{xz}xz$, these are two different models, and you will usually have $b_0\neq b_0'$. – Stephan Kolassa Jan 19 '23 at 10:32
  • What confuses me is that I supposed that "if X1 and X2 are zeros, then the interaction should also be zero at that point" but I guess that's the wrong way to think about that? – Ido Ben Artzi Jan 19 '23 at 11:49
  • 4
    It's not the wrong way. It just is a bit myopic, in that the estimate of the intercept depends not only on what the fit would be for all predictors equal to zero, but also on the entire distribution of predictors and the outcome (and the model). I think that once you include this additional dependency into account, it becomes less surprising that the intercept changes. – Stephan Kolassa Jan 19 '23 at 14:50
  • 3
    Another way to think is that the intercept, which is $\hat Y(0,0)$, is only guaranteed to estimate the true mean of $Y$ at $(0,0)$ if the model is correctly specified. But if there is an interaction, the model with just main effects is not correctly specified, so there's no reason for its intercept to correctly estimate $Y(0,0)$. The intercept in the interaction model is correct and the intercept in the main-effects model is wrong as an estimator of $Y(0,0)$ – Thomas Lumley Jan 19 '23 at 20:20
  • Thanks, that helps enormously. So when I'm having a model with two centered predictors and no interaction, the intercept will be the mean of y, but that changes after adding the interaction. Why is that? – Ido Ben Artzi Jan 20 '23 at 13:14