In my research, I have four independent variables (X1, X2, X3 and X4) and one response variable (Y). Upon checking the VIF values of the explanatory variables, I noticed that they have high multicollinearity, so I decided to use a Ridge Regression to make predictions about the variables, without, of course, using their P-values. When I plotted the Ridge Trace Plot and the chart with the regression coefficients from the Ridge Regression outcome, I noticed that X2 and X4 had negative effects on the model.
I wasn't surprised that this would happen with X4, but I didn't expect the same to happen with X2. I was even more puzzled when I performed a simple linear regression comparing Y and X2, and noticed a positive correlation.
So I decided to remove X3 from my study, since this variable had the highest VIF value (about 13), and I realized that everything made more sense, with X2 presenting a positive influence on the model.
Should I then delete X3 permanently? If I can include this variable in the model, how would I explain the fact that X2 has a negative effect on the model, and yet it has a positive correlation with Y?
