I want to run a binary logistic regression. After pearson test I have 2 correlated variables, one continuous and one categorical. I guess I should eliminate categorical variable. But I want to know the theory of eliminating correlated variables. Thanks.
Asked
Active
Viewed 25 times
0
-
2Welcome to Cross Validated! Does this answer your question? Using VIF (variance inflation factor) to reduce the data set and account for multicollinearity decreases the r2 and the performance of my model It is not clear that you should eliminate either variable, as the link shows can be damaging. – Dave Feb 18 '23 at 00:48
-
I want to avoid overfitting so I have to eliminate correlated variable. Between A and B how can I choose? – Mostafa Ahmadi Feb 18 '23 at 00:58
-
As you can see from the link, there is no guarantee that this is a good strategy that will improve model performance. You might wind up getting worse performance (even out-of-sample) by eliminating one of the variables than you by keeping both. – Dave Feb 18 '23 at 01:00
-
You are right, particularly my sample size is small.. I should save information for better modeling. – Mostafa Ahmadi Feb 18 '23 at 01:06