Selecting features for classification model

Question

I have a dataset that contains some numeric variables. What im doing is trying to check how important these variables are in describing endogenous variable in order to include them in my classifier.

I understand that in order to do so, I need to check p-values of given variables and in order to do so, what I did is i tried to use f_classif and chi2.

I tried to do that in python the same way as in documentation (scikit learn), but i do not understand how to interpret the results. Moreover, in example on sklearn documentation I get this:

selector = SelectKBest(f_classif, k=4)
selector.fit(X_train, y_train)
scores = -np.log10(selector.pvalues_)
scores /= scores.max()

What does this transformation mean and how to interpret this?

Thanks for any help

Why do you think that you "need to check p-values of given variables" (I assume you mean each on its own?)? I know this is sometimes done by people, which is where you may have seen this, but it's not a good practice. — Björn, Jul 14 '22 at 14:50
Yep, I've seen it being done on some other model and if I recall correctly, it was done in order to identify variables that are not statistically significant and shall be dropped from the model before building it. I am aware that I may use any form of regularization instead, I just wanted to understand what is it for and how to interpret this stuff, like chi2, anova and so on. — funkurlif3, Jul 14 '22 at 17:18
You might have better luck if you rephrase the question to focus on the $-\log _{10} (p)$ transformation. As it stands, your main question is muddied by how it came up in the context of a poor practice (as @Björn has noted). — Dave, Jul 20 '22 at 17:44
But that is actually what Im trying to understand - why is it a poor practice. In econometrics classes everyone emphasizes about feature importance and You guys are telling me it is irrelevant - I am trying to understand why? Especially if we have a variable that we know upfront that may be irrelevant to us as a predictor, yet we need to have a proof for that. — funkurlif3, Jul 25 '22 at 08:30
That’s really a separate question, one that Frank Harrell addresses around the 18-minute mark of the keynote address here You might want to post a new question, since “What is minus log of a p-value?” is a separate issue from selecting features or determining feature importance. — Dave, Jul 25 '22 at 09:51

score 0 · Answer 1 · answered Jul 20 '22 at 17:54

The transformation exists to put tiny numbers on a more meaningful scale. Humans see tiny p-values like $0.00001627$ and $0.0003119$ as just "tiny numbers". Try taking $-\log_{10}$ of those values; you will get $4.789$ and $3.506$. This emphasizes the difference between what might otherwise be seen as a tiny difference. "Tiny" vs "tiny" is harder to compare than $4.8$ vs $3.5$.

Note, however, the issues with feature selection.

Selecting features for classification model

1 Answers1