I have a dataset that contains some numeric variables. What im doing is trying to check how important these variables are in describing endogenous variable in order to include them in my classifier.
I understand that in order to do so, I need to check p-values of given variables and in order to do so, what I did is i tried to use f_classif and chi2.
I tried to do that in python the same way as in documentation (scikit learn), but i do not understand how to interpret the results. Moreover, in example on sklearn documentation I get this:
selector = SelectKBest(f_classif, k=4)
selector.fit(X_train, y_train)
scores = -np.log10(selector.pvalues_)
scores /= scores.max()
What does this transformation mean and how to interpret this?
Thanks for any help