Variable with negative coefficient in Ridge Regression and positive correlation

Question

In my research, I have four independent variables (X1, X2, X3 and X4) and one response variable (Y). Upon checking the VIF values of the explanatory variables, I noticed that they have high multicollinearity, so I decided to use a Ridge Regression to make predictions about the variables, without, of course, using their P-values. When I plotted the Ridge Trace Plot and the chart with the regression coefficients from the Ridge Regression outcome, I noticed that X2 and X4 had negative effects on the model.

I wasn't surprised that this would happen with X4, but I didn't expect the same to happen with X2. I was even more puzzled when I performed a simple linear regression comparing Y and X2, and noticed a positive correlation.

So I decided to remove X3 from my study, since this variable had the highest VIF value (about 13), and I realized that everything made more sense, with X2 presenting a positive influence on the model.

Should I then delete X3 permanently? If I can include this variable in the model, how would I explain the fact that X2 has a negative effect on the model, and yet it has a positive correlation with Y?

Simpson's paradox? https://en.wikipedia.org/wiki/Simpson%27s_paradox — psboonstra, Aug 13 '21 at 01:29
@psboonstra, but would Simpson's paradox apply to numerical variables? — , Aug 13 '21 at 01:47
yes, I believe it applies. its just a bit harder to visualize. but see my answer which I believe captures simpsons paradox for your case — psboonstra, Aug 13 '21 at 02:11

psboonstra · Answer 1 · 2021-08-24T01:51:18.253

We can ignore the ridge regression bit, and we can also suppose you have just two variables, $X_1$ and $X_3$ (these are the two most relevant to your question). Suppose that $E[Y|X_1,X_3] = \beta_0+\beta_1X_1+\beta_3X_3$. Then, iterating the expectation, $E[Y|X_1] = E_{X_3|X_1} E[Y|X_1,X_3] = \beta_0 + \beta_1X_1 + \beta_3 E[X_3|X_1]$. Now, further suppose that $E[X_3|X_1] = \gamma_0+\gamma_1X_1$. Then, $$E[Y|X_1] = \beta_0 + \beta_1X_1 + \beta_3 (\gamma_0+\gamma_1X_1)\\ = \beta_0 + \beta_3 \gamma_0+(\beta_1+\gamma_1)X_1$$

So, if $\beta_1+\gamma_1 > 0$ and $\beta_1<0$, then you would expect the sort of findings that you are encountering in your data analysis. Moreover, the regression coefficients are functions of correlations: if the assumed model is correct, then $\gamma_1= \mathrm{cor}(X_1,X_3)\sqrt{\dfrac{\mathrm{var}(X_3)}{\mathrm{var}(X_1)}}$. So, the sign of $\gamma_1$ is determined by the sign of the correlation, and the magnitude of $\gamma_1$ is driven by the size of the correlation and the relative ratios of the variances of $X_3$ to $X_1$.

I believe I understood your explanation, thank you very much! In this case, how then could I investigate in R this possible empiric variance in X3? If it's really happening, would it be interesting to develop the Ridge Regression without this variable? — , Aug 13 '21 at 04:39
I made some changes to my answer based on your comment. What I mean is look at the correlation between $X_1$ and $X_3$ and the variances of $X_1$ and $X_3$, i.e. cor(X1,X3), var(X1), and var(X3). Re: your question about ridge regression, it's not a bad thing to do here. Are you using AIC or something to choose your tuning parameter? — psboonstra, Aug 24 '21 at 01:55

Sextus Empiricus · Answer 2 · 2023-02-20T09:31:24.163

Here's an image that explains psboonstra's idea of the situation being a case of the Yule-Simpson effect.

I simulated $X_1 \sim N(0,1)$ and $Y = X_1 + \epsilon$ with $\epsilon \sim N(0,1)$, and added an extra variable $X_2 = 5+\text{round}((Y+X_1)/2)$

set.seed(1)
n = 300
X1 = rnorm(n,0,1)
noise = rnorm(n,0,1)
Y = X1+noise
X2 = 5+round((X1+Y)/2)
plot(X1,Y, col = X2, pch = 20)
lm(Y~X1+X2)

In this case the variable $X_2$ is discrete such that one can see it still as a categorical variable as in common examples of the effect.

The Yule-Simpson effect is that the positive effect of $X_1$ on $Y$ reverses to a negative effect when you consider the "groups" $X_2$ as well.

With some imagination one can see the variable $X_2$ becoming continuous and giving a continuous range of groups.

Variable with negative coefficient in Ridge Regression and positive correlation

2 Answers2

Linked