How to calculate the feature importance for multi-label classification problems

Question

I am looking for some sources about "how to calculate the feature importance for multi-label classification problems". would you give me some information with related python source code on how to apply feature importance in multi-label datasets?

Welcome to Cross Validated! What's wrong with the methodology you would use on a binary problem? — Dave, Jan 26 '22 at 18:52

score 0 · Answer 1 · answered Jan 26 '22 at 18:59

Welcome to Cross Validated!

It depends on your model, but broadly speaking, I would heavily recommend some version of Permutation Feature Importance to figure out which features are helpful. Read more here: https://scikit-learn.org/stable/modules/permutation_importance.html

This technique works with basically any algorithm and any target type (binary, multi-class, regression etc.)

There are various packages that implement it, like sklearn in Python and Boruta in R.

Here's the intuition for how Permutation Feature Importance works:

Broad idea is that the more important a feature is, the more your performance should suffer without the help of that feature. However, instead of removing features to see how much worse the model gets, we are shuffling/randomizing features. The idea there is that by shuffling it, you still have the same number of variables (since you didn't remove one), and the shuffled variable has the same distribution as the original, but it's randomly ordered now (so any real connection between that variable and your target should be destroyed).

Procedure:

Train single model (Model 1) on all features, obtain its performance on some validation set.
Shuffle one of the features (this means just randomize the order of values in that variable vector). So for example, if you have 1000 rows of data, you take the 1000 values of variable A and randomize their order.
Run Model 1 (do not re-train it. Run the Model 1 as you trained it on all the original features, but feed in a dataset with Variable A randomized). Observe difference in performance relative to when you had variable A non-randomized.
Repeat for all variables.

This way, you can see what variables are important and which are not. I hope it makes sense that broadly speaking, if you totally jumbled up the values of a variable and performance wasn't impacted, that variable probably wasn't very important to your model.

Please carefully read the links, as there are some considerations. For example, if there's a lot of multi-collinearity between features, this might have problems. Additionally, this is a measure of how important the features are to one specific model. If your model is terrible, then this feature importance might not be a good representation.

Quick tip for Permutation Feature Importance: In order to have a faster and more logical way of running this, try clustered Permutation Feature Importance (this also solves problems related to have multicollinearity among your features)(https://scikit-learn.org/stable/auto_examples/inspection/plot_permutation_importance_multicollinear.html#sphx-glr-auto-examples-inspection-plot-permutation-importance-multicollinear-py) . Essentially, group your features into several groups (by which variables are most similar/correlated), and then run permutation feat. imp. on each of the entire groups, not on individual variables.

I hope this gives you some good directions to explore!

How to calculate the feature importance for multi-label classification problems

1 Answers1