0

I am training an object detection model using the YOLOv5 architecture. I have the following classes and counts.

| class | n_instances |
| ----- | ----------- |
| dog   | 1000        |
| cat   | 4000        |
| fox   | 8000        |

My training dataset is imbalanced. The difference between the largest and the smallest group is (8000/(8000 + 1000)) - (1000/(8000 + 1000)) = 77%. What should I do?

  • Option 1: Just ignore it and train the model with this imbalanced dataset.
  • Option 2: Balance the dataset by removing 7000 foxes, 3000 cats and train the model with a balanced dataset of 1000 instances of each class.

I can imagine that option 2 is better suited for very imbalanced datasets, and option 1 for only slightly imbalanced datasets. If this is true, what would define 'very' and 'slightly'? In other words, at which percentage would be the threshold?

Peter
  • 239
  • 2
  • 12

0 Answers0