5

Through 10 cv, the roc-auc value was obtained as follows. At first, I tried to select the feature with the highest average roc-auc value, but I had doubts about whether the difference in these scores was statistically significant. What should I do to resolve this question?

roc auc value 0.848, 0.847, 0.848,0.848,0.845,0.844,0.844,0.842,0.84,0.835, 0.836

I removed the variable with the lowest variable importance value and then calculated the roc-auc value again.

JAE
  • 79
  • 2

2 Answers2

6

The concordance probability (AUROC) is not sensitive enough for comparing models. Use a sensitive measure such as mean squared error, log-likelihood, AIC. Related information is here.

Frank Harrell
  • 91,879
  • 6
  • 178
  • 397
5

I would suggest using penalized/regularised methods directly and avoid feature selection altogether. CV.SE has some excellent threads on the matter (e.g. Variable selection for predictive modeling really needed in 2016?, Why is variable selection necessary?) which I would urge reading through carefully. After that the CV.SE thread When to use regularization methods for regression? can help you get some more context on how to use regularisation more formally.

usεr11852
  • 44,125
  • 1
    Well put. The main problem with penalization is that when the sample size is too small to fit an unpenalized model it is often too small to be able to figure out the optimum penalty. That as a vote for specifying carefully thought out Bayesian shrinkage priors instead. – Frank Harrell Mar 04 '24 at 13:20