I am doing multiple regression analysis, in which i want to eliminate some of the insignificant features. In most of the machine learning books subset selection, shrinkage methods or PCA is used for reducing number of feature. Why p-values are not commonly used for feature selection?
Asked
Active
Viewed 722 times
0
-
5This is why. (Whether you do it in a stepwise manner or all at once doesn't change the fundamental problem.) – Stephan Kolassa Nov 19 '15 at 08:20
-
@Stephan : I read the answer. Does it imply p-values should never be used? – Siddhesh Nov 19 '15 at 12:40
-
2No. You can use and interpret p values if you use them correctly. This is a good place to start understanding them. In your specific case, if you look at multiple models (by selecting features), the p values will not be uniformly distributed under the null hypothesis any more, so you either need to find their new distribution (e.g., through simulation) or interpret them differently. – Stephan Kolassa Nov 19 '15 at 12:45