Feature selection for NN what is allowed and what isn't

Question

I've seen several people doing prefiltering based on correlation coefficients. But would it be ok to find best features using GradientBoostingRegressor and then use those to train a network?

Is there any chance I could implement such computation within the Network?

My data consists of 40 thousand features and around 60 samples. All of them numerical and the output is also numerical (the age of the person). It is a regression problem.

I am experiencing "the curse of dimensionality" and wanted to know what would be the best way to address feature selection. Filtering methods and PCA didn't give optimal results.

Any pre-processing steps, (including feature selection), that uses the target variable to make the selection risks overfitting. Instead, you should do something like a cross-validation for the whole process. See the discussion about "phased" analysis pipelines here https://stats.stackexchange.com/a/184023/22311 in the context of L1/L2 regularization. — Sycorax, Dec 12 '22 at 03:26
Hi @Caterina. Please add any additional relevant info in your question body rather than comments. — User1865345, Dec 12 '22 at 03:29
@Dave presumably because training the models naively results in massive overfitting; see OP's previous question https://stats.stackexchange.com/questions/598587/why-am-i-obtaining-values-close-to-zero-when-using-a-nn-for-regression — Sycorax, Dec 12 '22 at 13:56
In the end I solved that (the question you mention) by simply changing the activation function of last layer to LeakyRelu. And yes I made a MAE vs number of features graph (ordered by pearson correlation coefficient to age) and the performance does change. — Caterina, Dec 12 '22 at 14:19

Feature selection for NN what is allowed and what isn't

0 Answers0