0

I've derived a regression model with continuous independent variables and got conefficients which are all significant in t-Test, but the multi collinearity exists. Do I have to resolve collinearity further even in this case?

For the details,

Data Location : https://blog.kakaocdn.net/dn/bxHByC/btrWFJtvIeT/WSxcy9ZLkG1VNOYmeVxJDK/touched.csv?attach=1&knm=tfile.csv

I've built a regression model such that "Target ~ B + RM + CRIM + DIS + INDUS + LSTAT + NOX + PTRATIO + RAD + ZN + TAX + CHAS" and all coefficients are significant in t-Test. Additional information would be R^2 = 0.755, F statistical p value = 3.50 e^-111 but result has warned "The condition number is large, 1.67e+04. This might indicate that there are strong multicollinearity or other numerical problems." which means there is multi collinearity in the model. Do I have to go further to remove collinearity even in this case?

Thanks!

  • 1
    Welcome to Cross Validated! What would it accomplish to “resolve” multicollinearity? – Dave Jan 18 '23 at 16:50
  • @Dave There is high VIF variable such as NOX or something. Removing some High VIF variables gets less condition number which means the model has less collinearity. I wonder if the removing process would have any meaning under all coefficients of variables are already significant. it sounds to me that the model is interpretable without removing High VIF variables. is it? – Marcel Kim Jan 18 '23 at 17:06
  • 1
    This is why I ask what you aim to accomplish by resoving multicollinearity. // There are legitimate reasons to dislike multicollinearity, but a) sometimes life is hard, and b) I believe that much of what analysts write about multicollinearity comes from a combination of misunderstanding the "no correlation" assumption of the Gauss-Markov theorem and a belief that multicollinearity means that some variables can be dropped because their information is contained in other variables, without considering the harm from losing the information that is not contained in the remaining variables. – Dave Jan 18 '23 at 17:10
  • @Dave Oh, That's what I wanted to make sure. For your information, In the case of prediction, multi collinearity would not be a problem as far as I know, I aim to buld an interpretable model. I don't think multi collinearity would be a problem if all coefficients are significant as you said. Do I understand corretly what you said? – Marcel Kim Jan 18 '23 at 17:19
  • There can be overfitting issues. Since there is a lack of precision in the coefficient estimates (wide confidence intervals), the coefficients might be estimated quite differently in different sets, meaning that your ability to predict might be poor. Regression approaches like ridge regularization help to mitigate this by penalizing the coefficients when the get large. If you've done some kind of model validation, however. Perhaps a good answer to this question would discuss the details of that. – Dave Jan 18 '23 at 17:49
  • A large condition number is a warning of possible numerical instabilities or even inaccuracies in the results. Many regression models are fit using numerical searches and achieve, by default, a precision of about half the machine precision -- around $10^{-8}.$ When the condition number grows large enough to swamp that, you're in trouble. In this case a condition number around $10^4$ doesn't look like a problem unless you want more than four decimal places of precision in your estimates. – whuber Jan 18 '23 at 17:58
  • 1
    @Whuber Now the warning has been gone since I've normalized the independent varialbes by the clue you had hit such that computer precision. Thank you a lot. – Marcel Kim Jan 19 '23 at 06:46
  • Possible duplicate. My answer definitely addresses some of the issues, though I make no explicit reference to a condition number. // After you’ve read that link and the comments about condition number, might you want to post a self-answer? An advantage to you of doing so is that the community can point out if you’re making any mistakes, and if you do and get downvoted, you can delete the answer to erase the downvotes. – Dave Mar 07 '23 at 17:43

0 Answers0