I have the following F1 score function that I use for the model when I train it as part of metrics and as well during prediction:
#This is to calculate F1
def f1(y_true, y_pred):
def recall(y_true, y_pred):
"""Recall metric.
Only computes a batch-wise average of recall.
Computes the recall, a metric for multi-label classification of
how many relevant items are selected.
"""
print(y_pred)
y_pred = y_pred.ravel() < 0.5
y_pred = tf.cast(y_pred, tf.float32, name=None)
print(y_pred)
true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
possible_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
true_positives = tf.cast(true_positives, tf.float32, name=None)
possible_positives = tf.cast(possible_positives, tf.float32, name=None)
recall = true_positives / (possible_positives + K.epsilon())
return recall
def precision(y_true, y_pred):
"""Precision metric.
Only computes a batch-wise average of precision.
Computes the precision, a metric for multi-label classification of
how many selected items are relevant.
"""
print(y_pred)
y_pred = y_pred.ravel() < 0.5
y_pred = tf.cast(y_pred, tf.float32, name=None)
print(y_pred)
true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
true_positives = tf.cast(true_positives, tf.float32, name=None)
predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1)))
predicted_positives = tf.cast(predicted_positives, tf.float32, name=None)
precision = true_positives / (predicted_positives + K.epsilon())
return precision
precision = precision(y_true, y_pred)
recall = recall(y_true, y_pred)
return 2*((precision*recall)/(precision+recall+K.epsilon()))/100
When I compile the model I use as follows:
model.compile(loss=ContrastivLoss(margin=1), optimizer=rms, metrics=["accuracy", f1])
Output during fitting the model on the training and validation dataset for the last epoch (with early stopping on):
Epoch 21/100
447/447 [==============================] - 1s 3ms/step - loss: 0.1646 - accuracy: 0.2271 - f1: 0.3198 - val_loss: 0.1963 - val_accuracy: 0.2695 - val_f1: 0.5232
Although I have validation dataset during training, I also kept some data for testing:
loss = model.evaluate(x=[test_data[:,0],test_data[:,1]], y=labels_test)
y_pred_train = model.predict([train_data[:,0], train_data[:,1]])
train_f1= f1(labels_train, y_pred_train)
train_accuracy = accuracy(labels_train, y_pred_train)
y_pred_test = model.predict([test_data[:,0], test_data[:,1]])
test_f1 = f1(labels_test, y_pred_test)
test_accuracy = accuracy(labels_test, y_pred_test)
print("Loss = {}, Train F1 = {} Test F1 = {}".format(loss, train_f1, test_f1))
print("Loss = {}, Train Accuracy = {} Test Accuracy = {}".format(loss, train_accuracy, test_accuracy))
Edit: Output, where you can see F1 score training and testing during model prediction phase, is very low compared to the one in training and validation during model fitting above:
Loss = [0.19634802639484406, 0.2694787085056305, 0.26106637716293335], Train F1 = 0.008032719604671001 Test F1 = 0.008442788384854794
Loss = [0.19634802639484406, 0.2694787085056305, 0.26106637716293335], Train Accuracy = 0.7953781512605042 Test Accuracy = 0.7305213004484304
Is it because in f1 I used for testing data I divide by 100? But I already did that with training and validation:
return 2*((precision*recall)/(precision+recall+K.epsilon()))/100
Edit: It's mentioned that this could be due to overfitting. However, if you look at the validation loss below, it does not show a clear sign of overfitting. So I am not sure what you would suggest please.

train-f1: 0.3198 -- val_f1: 0.5232are very different from the F1 scores I got in the model prediction phasetest-f1:0.0084 -- train-f1: 0.0080, is not it, please? I used the same F1 score function for both training/validation and in testing as I posted above. – Avv Feb 01 '23 at 16:15