17

I have an imbalanced multi-class dataset and I want to use the class_weight argument from fit_generator to give weights to the classes according to the number of images of each class. I'm using ImageDataGenerator.flow_from_directory to load the dataset from a directory.

Is it possible to directly infer the class_weight argument from the ImageDataGenerator object?

Fábio Perez
  • 20,602
  • 20
  • 71
  • 96

6 Answers6

37

Just figured out a way of achieving this.

from collections import Counter
train_datagen = ImageDataGenerator()
train_generator = train_datagen.flow_from_directory(...)

counter = Counter(train_generator.classes)                          
max_val = float(max(counter.values()))       
class_weights = {class_id : max_val/num_images for class_id, num_images in counter.items()}                     

model.fit_generator(...,
                    class_weight=class_weights)

train_generator.classes is a list of classes for each image. Counter(train_generator.classes) creates a counter of the number of images in each class.

Note that these weights may not be good for convergence, but you can use it as a base for other type of weighting based on occurrence.

This answer was inspired by: https://github.com/fchollet/keras/issues/1875#issuecomment-273752868

Fábio Perez
  • 20,602
  • 20
  • 71
  • 96
  • but train_generator.classes only returns a list of classes, like a set, no? – Nassim Ben Mar 03 '17 at 19:59
  • 1
    It returns a list of classes for each image. For instance, if we have three images, the first two are from class 1 and the last one is from class 0, `train_generator.classes` equals `[1, 1, 0]`. – Fábio Perez Mar 03 '17 at 20:04
  • 1
    Indeed, just went to see the source code :) Good job – Nassim Ben Mar 03 '17 at 20:12
  • Hey, thanks for this. Can you elaborate what you mean by "these weights may not be good for convergence"? – arao6 Dec 28 '19 at 00:32
  • But how to do this in numpy, pythor or tensorflow (i.e. when image label are numpy array)? – Jaja Apr 08 '21 at 10:35
15

Alternatively, you can simply do:

from sklearn.utils import class_weight
import numpy as np

class_weights = class_weight.compute_class_weight(
               'balanced',
                np.unique(train_generator.classes), 
                train_generator.classes)

You can then set (as per comment above):

model.fit_generator(..., class_weight=class_weights)
Pasha Dembo
  • 231
  • 3
  • 2
1

I tried both solutions and the sklearn.utils.class_weight one gives better accuracy though I am not sure why. They do not both yield the same class weights.

David Brown
  • 123
  • 1
  • 10
1

As suggested in the article here, a good way to assign class weights is to use:

(1 / class_count) * (total_count/2)

Thus, slightly modifying the method suggested above by Fábio Perez:

counter = Counter(train_generator.classes)
total = float(sum(counter.values()))
class_weight = {class_id : (1/num_images)*(total)/2.0 for class_id, num_images in counter.items()}
Aman Agrawal
  • 53
  • 1
  • 1
  • 4
0

The code suggested by Pasha Dembo works pretty well. However, you should transform it in a dictionary before inserting in the model_fit generator:

from sklearn.utils import class_weight import numpy as np

class_weights = class_weight.compute_class_weight(
           'balanced',
            np.unique(train_generator.classes), 
            train_generator.classes)

train_class_weights = dict(enumerate(class_weights))
model.fit_generator(..., class_weight=train_class_weights)

Alternatively, you can simply do:

 from sklearn.utils import class_weight import numpy as np
 
 class_weights = class_weight.compute_class_weight(
                'balanced',
                 np.unique(train_generator.classes), 
                 train_generator.classes) You can then set (as per comment above):
 
 model.fit_generator(..., class_weight=class_weights)
DCCoder
  • 1,499
  • 4
  • 13
  • 29
0
from sklearn.utils import class_weight
import numpy as np
class_weights = dict(zip(np.unique(traingen.classes),class_weight.compute_class_weight(
                        class_weight = 'balanced',
                        classes = np.unique(traingen.classes), 
                        y = traingen.classes)))
Soheil
  • 11
  • 1
  • Your answer could be improved by adding more information on what the code does and how it helps the OP. – Tyler2P May 02 '22 at 07:31