2

I was asked this during an interview, and I'm curious if my thinking is correct.

Fit linear regression twice to two features, $x_1$ and $x_2$. You get two coefficients $\beta_1$ and $\beta_2$, both greater than $1$. Now fit linear regression to both features at the same time. Can either coefficient be negative?

My intuition is that yes, the coefficient sign can flip, if $x_1$ and $x_2$ are collinear. OLS parameter estimates are unstable here since the normal equation requires inverting the Gram matrix $\mathbf{X}^{\top} \mathbf{X}$, which has the same rank as $\mathbf{X}$. (1) Am I correct and (2) if so, is my analysis thorough? Not sure if there's anything else I should consider here or a better way to explain why the coefficients can flip signs.

jds
  • 1,694

1 Answers1

3

Yes, they can flip sign if they are correlated. Arguing this mathematically is likely possible, but we can just demonstrate that this can happen with simulation.


set.seed(0)

Generate correlated covars

X = MASS::mvrnorm(100, c(0,0), matrix(c(1, 0.99, 0.99, 1), nrow = 2))

Use them to generate observations. Only the first column has effect on y

y = X %*% c(2, 0) + rnorm(100, 0, 0.4)

Estimate 3 models: 2 with only one variable and 1 with both

m1 = lm(y~X[,1]) coef(m1) >>> (Intercept) X[, 1] 0.02606534 2.03186570

m2 = lm(y~X[,2]) coef(m2) (Intercept) X[, 2] >>>0.04038971 1.96816682

m = lm(y~X) coeff(m) >>> Coefficients: (Intercept) X1 X2
0.02581 2.07047 -0.03831

```