I have a very imbalanced Dataset. It's a binary classification. In the train set I have 150,000 times class 0 and 500 times class 1. That's about 0.33%
When I train a model like DecisionTree I get a f1-score of ~0,011 in several runs.
I've read that I could use methods for balancing an imbalanced dataset. So I did. I used SMOTE, undersampling and oversampling on the train set from the imbalanced-learn API. But the results get worse:
- Smote f1-score: 0
- oversampling f1-score: 0
- undersampling f1-score: 0.001
My procedure summarized:
- load data
- split data in train (0.7) & test (0.3) set
- use one or none of the balancing methods on the train set.
- Train Decision Tree and compute f1-score with test set
Looking only at the results, I would prefer to do the parameter optimizing and feature selection without a balancing method.
How do you see that? Do I have an error? Am I wrong? I would appreciate any information. I am still quite a beginner.
Thank you very much.
Okay, so it's seems it is not a problem, that the score is better without any imbalance-method. Thank you. ||
I also checked with roc_auc_score another metrics. But the tendency is the same
– SchwarzbrotMitHummus Oct 13 '20 at 14:45