Object detection: better to train with imbalanced dataset or remove images to balance out

Asked Sep 18 '23 at 05:54

Active Sep 18 '23 at 05:54

Viewed 8 times

I am training an object detection model using the YOLOv5 architecture. I have the following classes and counts.

| class | n_instances |
| ----- | ----------- |
| dog   | 1000        |
| cat   | 4000        |
| fox   | 8000        |

My training dataset is imbalanced. The difference between the largest and the smallest group is (8000/(8000 + 1000)) - (1000/(8000 + 1000)) = 77%. What should I do?

Option 1: Just ignore it and train the model with this imbalanced dataset.
Option 2: Balance the dataset by removing 7000 foxes, 3000 cats and train the model with a balanced dataset of 1000 instances of each class.

I can imagine that option 2 is better suited for very imbalanced datasets, and option 1 for only slightly imbalanced datasets. If this is true, what would define 'very' and 'slightly'? In other words, at which percentage would be the threshold?

asked Sep 18 '23 at 05:54

Peter

Object detection: better to train with imbalanced dataset or remove images to balance out

0 Answers0