1

I have already established method using R's Random Forest tools for ranking most important features; for binary classification task.

I'm looking for other methods for doing the same task. So that I can be sure that the features selection becomes more trustable. What are other alternatives to RF methods? (non-tree based?)

Stephan Kolassa
  • 123,354
  • 3
    Are you asking for a catalogue of methods for assessing feature importances across all types of models? Since there is no agreed upon principle for what a "feature importance" measurement should communicate, there are many, many possibilities, and its not within the scope of the site to breakdown all of them. – Matthew Drury Jul 18 '18 at 04:29

2 Answers2

1

There is an algorithm called the ReliefF algorithm that would work.

You could use Lasso regression to remove unimportant features.

astel
  • 1,528
1

randomForest::importance() already offers two measures of importance:

  • the reduction in out-of-bag predictive performance (measured by MSE) if a predictor is permuted randomly
  • the total decrease in node impurities from splitting on the variable, averaged over all trees, measured through the Gini index

I personally would trust the first measure more, since the second is "kind of" in-sample.

My recommendation would be to look at measures similar to the first one, using other KPIs than MSE with random permutations of predictors, like proper , either out-of-bag or on an actual holdout sample. (Don't use accuracy.)

Alternatively, fit other models than a random forest, e.g., a logistic regression, and assess standardized parameter estimates.

Stephan Kolassa
  • 123,354