My model looks like this
model = keras.Sequential(
[
layers.Conv2D(64, 3, activation='relu', padding='same', input_shape=(96,96,3)),
layers.Conv2D(64, 3, activation='relu', padding='same'),
layers.MaxPooling2D(),
layers.Conv2D(128, 3, activation='relu', padding='same'),
layers.Conv2D(128, 3, activation='relu', padding='same'),
layers.MaxPooling2D(),
layers.Conv2D(256, 3, activation='relu', padding='same'),
layers.Conv2D(256, 3, activation='relu', padding='same'),
layers.Conv2D(256, 3, activation='relu', padding='same'),
layers.MaxPooling2D(),
layers.Conv2D(512, 3, activation='relu', padding='same'),
layers.Conv2D(512, 3, activation='relu', padding='same'),
layers.Conv2D(512, 3, activation='relu', padding='same'),
layers.MaxPooling2D(),
layers.Conv2D(512, 3, activation='relu', padding='same'),
layers.Conv2D(512, 3, activation='relu', padding='same'),
layers.Conv2D(512, 3, activation='relu', padding='same'),
layers.MaxPooling2D(),
layers.Flatten(),
layers.Dense(4096, activation='relu'),
layers.Dropout(0.5),
layers.Dense(4096, activation='relu'),
layers.Dropout(0.5),
layers.Dense(2622, activation='relu'),
layers.Dropout(0.5),
layers.Dense(1, activation='linear')
]
)
My training set is made of almost 1mil color 96x96 images showing faces. The target is one value for each image. I'm trying to predict the valence of the emotion, that is how positive or negative an emotion is. This value can be any float in [-1,1]. The input images are normalized in the same range [-1,1].
For validation, I'm using almost 50k images with the same characteristics as the training set. Training and validation sets are completely separate.
This is how I'm training the model
learning_schedule = ExponentialDecay(
initial_learning_rate=1e-5,
decay_steps=400,
decay_rate=0.97)
opt = Adam(learning_rate=learning_schedule)
model.compile(optimizer=opt,
loss=CCC,
metrics=['mse'])
print("[INFO] training model...")
history = model.fit(ds,
validation_data=(testImages, testLabels),
epochs=5, shuffle=False)
CCC is a custom loss function (Lin's Concordance Correlation Coefficient)
def CCC(y_true, y_pred):
import keras.backend as K
# covariance between y_true and y_pred
s_xy = K.mean( (y_true - K.mean(y_true)) * (y_pred - K.mean(y_pred)) )
# means
x_m = K.mean(y_true)
y_m = K.mean(y_pred)
# variances
s_x_sq = K.var(y_true)
s_y_sq = K.var(y_pred)
# condordance correlation coefficient
ccc = (2.0*s_xy) / (s_x_sq + s_y_sq + (x_m-y_m)**2)
return 1 - ccc
Finally I'm using batch_size=320. The data in ds is shuffled by chunks of 80 because the CCC loss function is very small (even 0) if the valence values are similar. Since the images are taken from videos, it's very likely that consecutive frames show similar valence. As I said, to prevent this I split the input into chunks of 80 images (frames) and I shuffle these chunks. So batch_size=320 means that each batch is made of 4 chunks of 80 images and these 4 chunks are likely from different videos as they have been shuffled beforehand. This is a requirement of my model as later I will need to add a copule of GRU layers which require a sequence.
This is the loss after 5 epochs
This is the MSE after 5 epochs
I actually have tested the model for up to 50 epochs and the plot is the same. The train loss decreases as it should but the valid loss doesen't move. Valid loss is actually not exactly 1 but it decreases to just about 0.998ish, and it stays like that forever. I thought the loss function could be the problem and I switched to MSE, but I got similar results. I've tried with a smaller learning rate and with a fixed learning rate. I'va also tried to increase the drop probability of dropouts. All without success.
The model architecture is quite large but the dataset is very big too. I don't undertand why it's not learning. Do you have any good suggestions?


For your test set, something feels wrong. A MSE of 1 means that your data is exactly $\pm1$. Whereas it sounds like your input is continuous in $[-1,1]$.
– Alex R. Feb 01 '21 at 18:17