How to know if features chosen are right?

Question

How can we ensure that the chosen features can lead us to high accuracy if we made proper modifications in model architecture & hyper-parameters using the selected features, i.e. how can we make sure that the chosen set of features is really relevant to the problem and if we are not getting good accuracy then it's the issue of model architecture/hyper-parameter, not the feature.

You need knowledge of the domain and the data. Even with those it cannot be guaranteed. Without those this seems impossible. — mkt, Feb 26 '23 at 17:24
If you regress on features and consistently get good predictions (i.e., you validate your performance and guard against overfitting), those features must have some relationship with the outcome, right? Your post seems to suggest a belief that features unrelated to the outcome can be twisted into giving consistently good predictions by the magic of neural network hidden layers (which often feel like magic but certainly are not). Could you please clarify what you mean? — Dave, Feb 26 '23 at 18:06
@Dave let's say I've 1000s of features and I've to pick just 10 of out it. Now it may possible all those 1000s feature have no information regarding the target & in this case no model can give me good performance. Now the question is how will I know in prior to training model that my chosen features really contains enough information or not? So one check is to do feature and target pearson co-relation but this will simply tell linear relationship and will discard any features and have nin linear relationship to target. — Thunder, Feb 27 '23 at 06:40
Of course one way is to get domain knowledge & create features accordingly but I want to know any quantitive way of knowing useful features — Thunder, Feb 27 '23 at 06:40

mkt · Answer 1 · 2023-02-27T14:45:57.713

1

To summarise: you want to know if a specific set of features will give you predictive power

before exploring/analysing the data
without any specific knowledge about the domain & data, and
with no additional assumptions

I'm afraid this is impossible.

As a side note, it would be worth looking at Why is accuracy not the best measure for assessing classification models?

edited Feb 27 '23 at 14:45

answered Feb 27 '23 at 08:58

mkt

18,245
11
73
172

Mkt, data analysis is allowed prior to training model. But the question remain same i.e. after doing data analysis I should get confident that the features chosen on the basis of analysis are nearby perfect. – Thunder Feb 28 '23 at 12:59
@Thunder That's still impossible outside of unusual circumstances. You may be lucky if you do some visualisation and find a strong relationship with one or two predictors. But if not, it doesn't mean that they are meaningless because it's much harder to explore interactions. Essentially, without domain knowledge you need luck. If you must do this, I would use a random forest and examine the ranking of features by permutation importance to choose ones that seem relevant. But that's still using one ML algorithm to choose input for another, which you don't appear to want to do. – mkt Feb 28 '23 at 13:53

How to know if features chosen are right?

1 Answers1