2

I am predicting the salary to be offered to a new candidate for which I am concentrating on just continuous (9 in number) variables. Variables are as attached. When I ran OLS the coefficient for total experience came out negative but it is not expected to have a negative coefficient for total experience when "salary to be offered" is the dependent variable. Then, I removed the current salary and then ran OLS and the coefficient of total experience turned positive.

This clearly aligns with the fact that current salary and total experience are correlated therefore multicollinearity is present. Now I can't drop the variables because they are important.

I intend to build a linear equation that can explain the impact of every important predictor on the dependent variable and I thought ridge regression can help me with this problem but even in the presence of ridge regression the sign of the coefficient for total experience is negative which I think implies the multicollinearity is still present.

But even after ridge regression, the sign of the coefficient for total experience is negative.

In other words, shrinkage of the coefficients is not happening.

Can anyone suggest what might have happened?

Dave
  • 62,186
  • A side note: before applying OLS, it is apt to investigate the presence of multicollinearity and other anomalies, not the reverse. – User1865345 Aug 28 '22 at 11:20
  • 2
  • 2
    Unexpected sign if the coefficient is not a sign of multicollinearity. What exactly is the problem in here? Why does it bother you? – Tim Aug 28 '22 at 12:38
  • Is this an ML exercise or an actual project you are working on? In many countries there are regulations against pay discrimination and ML algorithms are not exactly known for their fairness and unbiasedness. – dipetkov Aug 28 '22 at 16:02
  • 5
    (1) Ridge regression does not remove multicollinearity but provides more stable estimators in case of multicollinearity. (2) The term "shrinkage" refers to the absolute value of coefficients rather than their sign. Did ridge regression reduce the absolute value of the coefficient or not? (3) "but it is not expected to have a negative coefficient for total experience when "salary to be offered" is the dependent variable" - why not? As Dave explained in his answer, having the current salary in the model may well have the effect that given this experience has a rather negative influence. – Christian Hennig Aug 28 '22 at 16:04
  • One subtlety is that on occasion RR can increase the absolute values of some coefficients. However, there is a natural measure of multicollinearity (sum of the singular values of the model matrix) that is mathematically guaranteed to decrease. – whuber Aug 28 '22 at 18:56

1 Answers1

4

There could be something funky happening from the standpoint of statistics, but it seems like, for a given current salary, having more experience is a detriment future prospects. I kind of understand that; if you’ve been working twenty years and still make the salary of a junior employee, maybe you’re not very good at your job and should not expect much of a salary increase at your next job (maybe even a pay cut if the high-end firms won’t hire you).

This could mean that you are witnessing Simpson’s Paradox, and there is an economic interpretation that could be valuable. Rather than be frustrated, I would prefer to take pride in uncovering this interesting economic phenomenon!

Dave
  • 62,186