Alternatives to Random Forest's Feature Importance for choosing best features

Question

I have already established method using R's Random Forest tools for ranking most important features; for binary classification task.

I'm looking for other methods for doing the same task. So that I can be sure that the features selection becomes more trustable. What are other alternatives to RF methods? (non-tree based?)

Are you asking for a catalogue of methods for assessing feature importances across all types of models? Since there is no agreed upon principle for what a "feature importance" measurement should communicate, there are many, many possibilities, and its not within the scope of the site to breakdown all of them. — Matthew Drury, Jul 18 '18 at 04:29

score 1 · Answer 1 · answered Jul 18 '18 at 04:30

1

There is an algorithm called the ReliefF algorithm that would work.

You could use Lasso regression to remove unimportant features.

answered Jul 18 '18 at 04:30

astel

1,528

score 1 · Answer 2 · answered Jul 18 '18 at 04:31

randomForest::importance() already offers two measures of importance:

the reduction in out-of-bag predictive performance (measured by MSE) if a predictor is permuted randomly
the total decrease in node impurities from splitting on the variable, averaged over all trees, measured through the Gini index

I personally would trust the first measure more, since the second is "kind of" in-sample.

My recommendation would be to look at measures similar to the first one, using other KPIs than MSE with random permutations of predictors, like proper scoring-rules, either out-of-bag or on an actual holdout sample. (Don't use accuracy.)

Alternatively, fit other models than a random forest, e.g., a logistic regression, and assess standardized parameter estimates.

Alternatives to Random Forest's Feature Importance for choosing best features

2 Answers2