How to reduce the overfitting in my CNN model?

Question

I am new in this world want practice for create a convolutional neuronal network. A model convolutional for image classification. I want classificate women and men images.

Previous, I did a course by internet.

For now my code is this:

import tensorflow as tf
from tensorflow.keras.layers import Conv2D, Flatten, Dense, MaxPooling2D, BatchNormalization, Dropout
from tensorflow.keras import models, optimizers, regularizers
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping, TensorBoard
import matplotlib.pyplot as plt
import datetime, tensorboard
Data Generator
train_data_generation = ImageDataGenerator(
    rescale=1./255,
    rotation_range=40,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True
)
test_data_generation = ImageDataGenerator(rescale=1./255)
Load the images in the Image Generators
train_generator = train_data_generation.flow_from_directory(
    '/kaggle/input/man-woman/ManWoman/train',
    target_size=(150,150),
    batch_size=32,
    class_mode='binary'
)
validation_generator = test_data_generation.flow_from_directory(
    '/kaggle/input/man-woman/ManWoman/validation',
    target_size=(150,150),
    batch_size=32,
    class_mode='binary'
)
Create model
model = models.Sequential()
model.add(Conv2D(32, (3,3), activation='relu', input_shape=(150,150,3), padding='same'))
model.add(BatchNormalization())
model.add(Conv2D(64, (2,2), activation='relu', padding='same', input_shape=(150,150,3)))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Flatten())
model.add(Dropout(.5))
model.add(Dense(1, activation='sigmoid'))
model.summary()
Train model
model.compile(
    loss='binary_crossentropy',
    optimizer= optimizers.Adam(lr=0.0005),
    metrics=['accuracy']
)
history = model.fit(
    train_generator,
    epochs=10,
    validation_data=validation_generator,
)
Accuracy is the percent success
plt.plot(history.history['accuracy'], label='Train')
plt.plot(history.history['val_accuracy'], label='Validation')
plt.legend()
plt.show()
Loss is the percent
plt.plot(history.history['loss'], label='Train')
plt.plot(history.history['val_loss'], label='Validation')
plt.legend()
plt.show()

I created a dataset. https://www.kaggle.com/datasets/cleanet/man-woman/versions/4

This dataset has 3 folders and in each folder two directories (man/woman):

train: images used during the trainning.
- man: +11000 images
- woman: 5929 images
validation: images used during the validation, differents to train folder
- man: 2356 images
- woman: 464 images
test: Images for test, after do the model, for check that the CNN work well. (I understood it so)
- woman: 281 images
- man: 317 images

Previous to CNN, I do data generation with ImageDataGenerator()

In the CNN, I am using BatchNormalization for reduce the numbers

In the dataset there are images with sizes differents, as I use BatchNormalization().flow_from_directory() I specific that the image size will be 150x150.

This I don't understand. This will do it is create new image 150x150 more the images sizes of dataset or all the images will be 150x150 and not will use dataset images?

Also, there are more man images that woman images in train, validation and test. This is recommended has the same count of each class?

When I train the CNN, the results are this:

Accuracy

Loss

I understand that there is overfitting in the 4 of X-axis.

This can be because the dataset is bad made?

Which are the best practices for do a dataset of classification images? And if the images are more big in size, I need more filters in Conv2 and your size more big (for example: 6x6)?

Thank you very much. Greetings

How to reduce the overfitting in my CNN model?

Data Generator

Load the images in the Image Generators

Create model

Train model

Accuracy is the percent success

Loss is the percent

0 Answers0