1

Is there a way to reload the weights from a certain epoch or the best weights from the model checkpoint files created by ModelCheckpoint once the training is over?

I have trained that trained for 10 epochs and created a checkpoint that only saved weights after each epoch. The final epoch's val_categorical_accuracy is a bit lower than epoch no. 5. I know I should have set save_best_only=True but I missed that.

  • So now, is there a way to get the weights from the best epoch or the epoch number 5?
  • Also, does ModelCheckpoint overwrites weights after each epoch in the checkpoint file?

What are my options here? Thanks for your help in advance.

Below is my implementation:

checkpoint_path = 'saved_model/cp.ckpt'
checkpoint_dir = os.path.dirname(checkpoint_path)
print(checkpoint_dir)

lstm_model.fit(X_train_seq_pad, y_train_cat,
               epochs=100,
               validation_data=(X_val_seq_pad, y_val_cat),
               callbacks=[callbacks.EarlyStopping(monitor='val_loss', patience=3),
                          callbacks.ModelCheckpoint(filepath=checkpoint_path,
                                                    save_weights_only=True,
                                                    verbose=1)])
M.Innat
  • 13,008
  • 6
  • 38
  • 74
Krishnang K Dalal
  • 2,062
  • 7
  • 28
  • 48

1 Answers1

2

If the filepath doesn't contain formatting options like {epoch} then filepath will be overwritten by each new better model. In your case, that's why you can't get the weight at a specific epoch (e.g epoch 5).


Your option here, however, is to choose the formatting option in the ModelCheckpoint callback during training time. Such as

tf.keras.callbacks.ModelCheckpoint(
                     filepath='model.{epoch:02d}-{val_loss:.4f}.h5', 
                     save_freq='epoch', verbose=1, monitor='val_loss', 
                     save_weights_only=True, save_best_only=False
                 )   

This will save the model weight (in .h5 format) at each epoch, in a different but convenient way. Additionally, if we choose save_best_only to True, it will save best weights in the same way.


Code Example

Here is one end-to-end working example for reference. We will save model weights at each epoch in a convenient way with a formatting option that we will define the filepath parameter as follows:

img = tf.random.normal([20, 32], 0, 1, tf.float32)
tar = np.random.randint(2, size=(20, 1))

model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(10, input_dim = 32, activation= 'relu'))
model.add(tf.keras.layers.Dense(1, activation='sigmoid'))

callback_list = [
       tf.keras.callbacks.ModelCheckpoint(
            filepath='model.{epoch:02d}-{val_loss:.4f}.h5', 
            save_freq='epoch', verbose=1, monitor='val_loss', 
            save_weights_only=True, save_best_only=False
       )         
]
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(img, tar, epochs=5, verbose=2, validation_split=0.2,
          callbacks=callback_list)

It will save the model weight at each epoch. And I will find every weight in my local disk.

# model.epoch_number_score.h5
model.01-0.8022.h5
model.02-0.8014.h5
model.03-0.8005.h5
model.04-0.7997.h5
model.05-0.7989.h5

However, note that I used save_best_only = False, but If we set it to True, you then only get the best weight in the same way. Something like this:

# model.epoch_number_score.h5
model.01-0.8022.h5
model.03-0.8005.h5
model.05-0.7989.h5
M.Innat
  • 13,008
  • 6
  • 38
  • 74
  • 1
    Hey, Thank a lot for the crisp explanation. Yes, I realized that `ModelCheckpoint` will overwrite weights after reading the doc so providing a naming schema is a good way to side step it and preserve the history of trained weights. This was very helpful and I liked the naming schema you have suggested here. – Krishnang K Dalal May 31 '21 at 06:22
  • Glad to help. :) – M.Innat May 31 '21 at 06:27