1

I have a very imbalanced Dataset. It's a binary classification. In the train set I have 150,000 times class 0 and 500 times class 1. That's about 0.33%

When I train a model like DecisionTree I get a f1-score of ~0,011 in several runs.

I've read that I could use methods for balancing an imbalanced dataset. So I did. I used SMOTE, undersampling and oversampling on the train set from the imbalanced-learn API. But the results get worse:

  • Smote f1-score: 0
  • oversampling f1-score: 0
  • undersampling f1-score: 0.001

My procedure summarized:

  1. load data
  2. split data in train (0.7) & test (0.3) set
  3. use one or none of the balancing methods on the train set.
  4. Train Decision Tree and compute f1-score with test set

Looking only at the results, I would prefer to do the parameter optimizing and feature selection without a balancing method.

How do you see that? Do I have an error? Am I wrong? I would appreciate any information. I am still quite a beginner.

Thank you very much.

0 Answers0