Different parameters coming out as important from decision tree and logistic regression

Question

I ran both logistic regression and decision tree model on the same dataset However, the parameters that come out as important in both vary.

For example, the most important parameter in the decision tree (the 1st split of the tree) doesn’t even pop out as important in the logistic regression model. The model accuracy from the confusion matrix is very similar in both the cases

I wonder which model to go ahead with in this scenario

score 0 · Answer 1 · answered Jan 14 '16 at 19:01

They may be different because both the methods are very different from each other. While logistic regression tries to predict the outcome using each feature independently, the tree models try to do using combination of features.

One way would be to see their performance on a totally unseen dataset. The better model performs very similar to the training dataset.

Other would be to actually see if you features need a tree model or not. This would require domain knowledge.

Another way would be to perform PCA on your features so that you're sure that you have independent features. Now you can feed them into a logistic regression model.

Whosoever downvoted the answer should have at least mentioned why he did so — Ujjwal Kumar, Jan 15 '16 at 02:27

score 0 · Answer 2 · edited May 11 '17 at 17:01

Most likely reason behind this is that your predictors are highly correlated. Check if this is the case by computing a correlation matrix. If this is indeed the case, then a more appropiate method of estimating the logistic regression would be a regularized approach, like Elastic Net (glmnet package in R). Another option is to do dimensionality reduction. Yet another option is to simply accept thatboth models may choose different variables when high correlation among them is present.

Different parameters coming out as important from decision tree and logistic regression

2 Answers2