0

I have compiled a model, and passed a callback for saving model weights after each epochs. In example, if I run 20 epochs, I will get 20 checkpoint models at the end. But it looks like model weight is always overwritten after each epochs, so I got only 1 epoch after training. it is weird, can anyone have any ideals to fix it ? This is my code:

checkpoint_filepath = './checkpoints2/checkpoint_default'
model_checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_filepath, mode="auto", save_freq=1, save_weights_only=True)
model = Sequential()
model.add(LSTM(100, input_shape=(train_x.shape[1], train_x.shape[2]), return_sequences=True))
model.add(LSTM(100, input_shape=(train_x.shape[1], train_x.shape[2])))
model.add(Dense(1))
model.compile(loss='mae', optimizer='adam')
# fit network
history = model.fit(train_x, train_y, epochs=22, validation_data=(val_x, val_y), batch_size=16, verbose=2, shuffle=False, callbacks=[model_checkpoint_callback])
plt.plot(history.history['loss'], label='train')
plt.plot(history.history['val_loss'], label='test')
plt.legend()
plt.show()

In this above example, I setting saving after each 1 epochs, and training for 22 epochs. So I should get 22 model checkpoints after running. But in practice, after running I got only 3 files in checkpoints2 directory: checkpoint, checkpoint_default.index and checkpoint_default.data-00000-of-00001. How can i fix it ? Please help

I am using tensorflow 2.7.0 without gpu.

voxter
  • 825
  • 2
  • 11
  • 28

1 Answers1

2

It's likely because you set the same saving location of the updated weight path in the model checkpoint callback. For example, you may need to use the setup as follows:

from tensorflow.keras import callbacks

callbacks.ModelCheckpoint(
     filepath='model.{epoch:02d}-{val_loss:.4f}.h5', 
     save_freq='epoch', verbose=1, monitor='val_loss', 
     save_weights_only=True, 
)         

Also, you can check these details answer regarding this.

M.Innat
  • 13,008
  • 6
  • 38
  • 74