0

I trained a model on initial data, got some good scores, and now after receiving more data I want to load the pre-trained model and continue to train.

Here some snippet of what I did:

(1) I follow this post which says to save the model in 'tf' format

# saving initial model
model.save(path2initial, save_format='tf')

# load pre-trained model
clf = tf.keras.models.load_model(path2initial)

# create new data generators
train_gen = generators.create(generator_config, 'train')
val_gen = generators.create(generator_config, 'val')

# create metrics, loss, optimizer and callbacks
loss = losses.create(loss_config)
callback_list = callbacks.create(callback_config)
optimizer = optimizers.create(optimizer_confing)
metrics = metrics.create(metrics_config)

# compile model
clf.compile(optimizer=optimizer, loss=loss, metrics=metrics)

# train
clf.fit(x=train_gen,
        epochs=NB_EPCOHS,
        validation_data=val_gen,
        steps_per_epochs=math.ceil(len(train_steps)/ BATCH_SIZE),
        validation_steps=math.ceil(len(val_steps)/ BATCH_SIZE),
        callbacks=callback_list,
        use_multiprocessing=True,
        workers=16,
        max_queue_size=8,
        verbose=1
)

I should note that two of my callbacks are

EarlyStopping(monitor='val_loss', restore_best_weights=True,
              min_delta=0.001, patience=10, mode='min', verbose=1) 
ModelCheckpoint(filepath, monitor='val_loss', verbose=1, save_best_only=True, 
mode='min', save_freq='epoch')

And that train_gen is consisted with both initial_data and new_data.

This method trained only for 4 epochs, and hadn't changed for the rest of the 10 'patient' epochs. Moreover the results were way worse than the initial model's results.

(2) The second method I tried was to save the model in the default format (that's the only change):

model.save(path2initial)
.
.
.

This model had trained for 71/200 epochs, but it seems that it ignored my EarlyStopping() callback. In some epochs the val-loss had changed by 1e-4 or even less, and still it continue with the training (weirddd), And it stopped (by EarlyStopping()) in epoch 71 even that the val-loss had change! Moreover, the results had barely changed.

For comparison I trained a model from scratch on all the data (both initial and new data) and got way better results:

               Initial data   Method (1)   Method (2)   New model on all data
mean F1 score:     0.735         0.422       0.74             0.803

Is there a proven way to how to continue training a keras model?

When loading the model does the optimizer status reset?

When loading the model, do I need to define all the callbacks, loss, opt, metrics all over again? Do I need to compile it again?

desertnaut
  • 52,940
  • 19
  • 125
  • 157

0 Answers0