in univariate feature ranking for classification, it is common to use the χ²-test(MATLAB-sklearn) to calculate importance scores based on the negative log of the associated p-value:
$Importance = -\ln(p)$
Is it possible to define an importance value that is associated with p = 0.05 to exclude variables that had a p value above that limit? In this case, $-ln(0.05) ≅ 3$
I am doing the selection so far by means of noise injection, however I noticed that the selection tends to change from run to run. Would it be possible to take the mean importance score of many random varibles and define the threshold there?
Thanks for any hint!
it is common to use the χ² ...citation needed, this procedure looks as arbitrary as the rest of them. – user2974951 Nov 08 '23 at 09:43