0

I have the following F1 score function that I use for the model when I train it as part of metrics and as well during prediction:

#This is to calculate F1
def f1(y_true, y_pred):
    def recall(y_true, y_pred):
        """Recall metric.
    Only computes a batch-wise average of recall.

    Computes the recall, a metric for multi-label classification of
    how many relevant items are selected.
    """
    print(y_pred)
    y_pred = y_pred.ravel() < 0.5
    y_pred = tf.cast(y_pred, tf.float32, name=None)
    print(y_pred)
    true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
    possible_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
    true_positives = tf.cast(true_positives, tf.float32, name=None)
    possible_positives = tf.cast(possible_positives, tf.float32, name=None)
    recall = true_positives / (possible_positives + K.epsilon())
    return recall

def precision(y_true, y_pred):
    """Precision metric.

    Only computes a batch-wise average of precision.

    Computes the precision, a metric for multi-label classification of
    how many selected items are relevant.
    """
    print(y_pred)
    y_pred = y_pred.ravel() < 0.5
    y_pred = tf.cast(y_pred, tf.float32, name=None)
    print(y_pred)
    true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
    true_positives = tf.cast(true_positives, tf.float32, name=None)
    predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1)))
    predicted_positives = tf.cast(predicted_positives, tf.float32, name=None)
    precision = true_positives / (predicted_positives + K.epsilon())
    return precision
precision = precision(y_true, y_pred)
recall = recall(y_true, y_pred)
return 2*((precision*recall)/(precision+recall+K.epsilon()))/100

When I compile the model I use as follows:

model.compile(loss=ContrastivLoss(margin=1), optimizer=rms, metrics=["accuracy", f1])

Output during fitting the model on the training and validation dataset for the last epoch (with early stopping on):

Epoch 21/100
447/447 [==============================] - 1s 3ms/step - loss: 0.1646 - accuracy: 0.2271 - f1: 0.3198 - val_loss: 0.1963 - val_accuracy: 0.2695 - val_f1: 0.5232

Although I have validation dataset during training, I also kept some data for testing:

loss = model.evaluate(x=[test_data[:,0],test_data[:,1]], y=labels_test)

y_pred_train = model.predict([train_data[:,0], train_data[:,1]]) train_f1= f1(labels_train, y_pred_train) train_accuracy = accuracy(labels_train, y_pred_train)

y_pred_test = model.predict([test_data[:,0], test_data[:,1]]) test_f1 = f1(labels_test, y_pred_test) test_accuracy = accuracy(labels_test, y_pred_test)

print("Loss = {}, Train F1 = {} Test F1 = {}".format(loss, train_f1, test_f1)) print("Loss = {}, Train Accuracy = {} Test Accuracy = {}".format(loss, train_accuracy, test_accuracy))

Edit: Output, where you can see F1 score training and testing during model prediction phase, is very low compared to the one in training and validation during model fitting above:

Loss = [0.19634802639484406, 0.2694787085056305, 0.26106637716293335], Train F1 = 0.008032719604671001 Test F1 = 0.008442788384854794
Loss = [0.19634802639484406, 0.2694787085056305, 0.26106637716293335], Train Accuracy = 0.7953781512605042 Test Accuracy = 0.7305213004484304

Is it because in f1 I used for testing data I divide by 100? But I already did that with training and validation:

return 2*((precision*recall)/(precision+recall+K.epsilon()))/100

Edit: It's mentioned that this could be due to overfitting. However, if you look at the validation loss below, it does not show a clear sign of overfitting. So I am not sure what you would suggest please.

enter image description here

Avv
  • 249
  • 3
    It seems to me from your last output that F1 is 0.0080 in training and 0.0084 in testing. That does not look like a big difference to me. (F1 and similar metrics suffer from exactly the same issues as accuracy, I would counsel against using them.) – Stephan Kolassa Feb 01 '23 at 16:10
  • @StephanKolassa. Thank you. But the ones I meant to compare their F1 scores are validation/training f1 score versus testing/training f1 score after model training in the prediction phase. F1 I got in during model fitting is train-f1: 0.3198 -- val_f1: 0.5232 are very different from the F1 scores I got in the model prediction phase test-f1:0.0084 -- train-f1: 0.0080, is not it, please? I used the same F1 score function for both training/validation and in testing as I posted above. – Avv Feb 01 '23 at 16:15
  • @StephanKolassa. I uploaded the loss figure, which does not show a sign of overfitting. So I am not sure if this can help. – Avv Feb 02 '23 at 03:43

0 Answers0