I want to perform feature selection on my data. I have too many features, about 50 - 60, for not so much samples.
Until today I was using the importance function of the xgboost package, but lately I was introduced to SHAP, and although it is more of an interpretation tool than anything, I was told it is a powerful and robust tool for feature selection, since it uses some advanced ideas from the game theory and stuff like that.
I wanted to make sure I'm on the right track. my code works fine, if it helps in any way, here is the code:
compute_and_filter_features = function(k_top, filter_features=TRUE, model = initial_model, train.x = train_x, test.x = test_x){
# Compute SHAP values
shap_values = SHAPforxgboost::shap.values(xgb_model = initial_model, X_train = data.matrix(train_x))
shap_values$shap_score = as.data.frame(shap_values$shap_score)
feature_importance = colSums(abs(shap_values$shap_score))
sorted_features = sort(feature_importance, decreasing = TRUE, index.return = TRUE)
top_features = names(feature_importance)[sorted_features<span class="math-container">$ix[1:min(length(sorted_features$</span>ix), k_top)]]
if (filter_features == TRUE) {
train_x_filtered = train_x[, top_features]
test_x_filtered = test_x[, top_features]
} else {
train_x_filtered = train_x
test_x_filtered = test_x
}
View(train_x_filtered)
View(test_x_filtered)
return(list(train_x_filtered=train_x_filtered, test_x_filtered=test_x_filtered, top_features = top_features, feature_importance = feature_importance))
}
SHAPforxgboostpackage in R. I can't find a straight answer for this anywhere .. – Programming Noob Jul 10 '23 at 13:41