I am new to data science & machine learning
I am using Weka platform to work on a classification problem with an imbalanced dataset. My question is: can I apply a feature selection method to a balanced copy of the dataset then I use the resulted subset of features in the original dataset (imbalanced)?
I my question is not clear, I will explain it by the following detail steps:
- I made two copies of the dataset, the original imbalanced dataset and the balanced dataset.
- I applied a feature selection method to the balanced copy. (at the end of this step, a subset of features is selected)
- In the imbalanced copy, I retained the selected features and removed the unselected ones.
For example: Assume that you have an imbalanced dataset with 5 features: a, b, c, d, and e features. You balanced the entire dataset. Then you applied a feature selection method. You got three selected features: a, b, and c. After that, you went back to the imbalanced dataset (the original one) and removed d and e features. Then you completed your procedures on the imbalanced dataset with a, b, and c features.
Is this procedure correct?
- cross validation to the imbalanced dataset.
- feature selection on the same dataset but after balancing it.
– Muneera Feb 10 '23 at 08:43