1

My intuition was that if an explanatory variable is independent of the response then in a multiple regression it should have a $\beta$ of zero.

Consider however the following very simple example: the distribution of $\left(Y,X_1,X_2\right)$ is multivariate normal, with a mean vector of $\mathbf{0}$, and a covariance matrix of $$\begin{pmatrix}10&2&0\\2&5&1\\0&1&1/2\end{pmatrix}.$$ Here the regression coefficients are $$\mathbf{\beta}=\begin{pmatrix}2&0\end{pmatrix}\begin{pmatrix}5&1\\1&1/2\end{pmatrix}^{-1}=\begin{pmatrix}\frac{2}{3}&-\frac{4}{3}\end{pmatrix},$$ i.e., $X_2$ has a non-zero $\beta$ despite being uncorrelated with $Y$, meaning independent from $Y$ in this case.

How can I image this?

I understand that this is a multivariate situation, so pairwise correlations are not conclusive (as the multivariate structure matters), but I thought that for multivariate normal, if I see a zero in the whole covariance matrix (and all variables are included in the regression) it just means that the $\beta$ needs to be zero.

Corollary question: if my intuition is not correct, then is the following statement true instead: ''In a multivariate normal model, a $\beta$ is zero iff the variable is uncorrelated with the response and it is also uncorrelated with all the remaining explanatory variables''...?

That's interesting, because it would mean that from the two conditions for omitted variable bias not to occur (the variable has a zero $\beta$ or it is uncorrelated with all the other variables) the first actually implies the second (in multivariate normal model, of course).

  • A correct interpretation of a zero correlation coefficient is that the explanatory variable is uncorrelated with the residuals of the regression of the response on all the other explanatory variables. See https://stats.stackexchange.com/questions/46185 inter alia. – whuber Jun 10 '19 at 14:39
  • Hm. That doesn't seem to be the case! Or I misunderstood you, here is what I have tried:

    library(MASS)

    SimData<-data.frame(mvrnorm(100000,rep(0,3),matrix(c(10,2,0,2,5,1,0,1,1/2),nc=3)))

    colnames(SimData)<-c("y","x1","x2")

    lm(y~x1+x2,data=SimData)

    res <- resid( lm(y~x1,data=SimData) )

    cov( res, SimData$x2 )

    – Tamas Ferenci Jun 12 '19 at 20:37
  • You are correct: I misstated the idea. Let me restate it in the context of your (helpful) R illustration. First, 0 is the result of with(SimData, cov(y,x2)) (up to sampling error, as always). Second, -4/3 is the result of (a) taking the effect of x1 out of all the variables: res.2 <- resid(lm(x2~x1,data=SimData)) and (2) regressing the y-residuals against the x2-residuals: lm(res ~ res.2 - 1). This helps show that although x2 and y are uncorrelated, x2 and y are associated after removing the effects of x1. – whuber Jun 12 '19 at 22:05
  • 1
    Thank you! I think I can now at least phrase what confuses me: that removing an effect introduces an effect. $X_2$ is independent of $Y$, it has no effect on $Y$ (as evidenced by the covariance matrix, or a regression with $X_2$ alone as predictor). When you put $X_1$ into the regression as well, you remove its effect from $X_2$s effect. And now comes the (false) intuition: if $X_2$s effect is already nil, then removing anything from it can only result in still nil effect. I'm still struggling to understand that, either conceptually, or graphically (which should also be possible here...). – Tamas Ferenci Jun 13 '19 at 06:43
  • Some efforts have been made in other threads to present both conceptual and graphical explanations: see https://stats.stackexchange.com/questions/17336 and https://stats.stackexchange.com/questions/46185 inter alia. – whuber Jun 13 '19 at 12:32
  • @whuber I now see that the heart of my misunderstanding was that I assumed that the covariance matrix contains the direct effects, while in reality it contains the total effects. It is entirely possible that the direct relationship between $X_2$ and $Y$ ($-4/3$) is exactly opposite of the indirect relationship mediated through $X_1$ ($2 \cdot 2/3$), resulting in 0 total relationship (seen in the covariance matrix), but non-zero effect when controlling for $X_1$. – Tamas Ferenci Feb 21 '20 at 09:40

0 Answers0