Accuracy always equal to recall

Question

Fitting 3 different models on a 5-class imbalanced dataset. The results show model accuracy always being equal to the recall. How can this be possible?

1. RF model results:
Test acc:   0.6285670349948376
Recall:     0.6285670349948376
Precision:  0.6171361174985392
f1_score:   0.5886671088640658
ROC AUC score:  0.7998931710957794

MLP model results:

Accuracy:   0.44232332330133345
Recall:     0.44232332330133345
f1_score:   0.4242650817694506
Precision:  0.4707025922895617
ROC AUC score:  0.6031862642540948

CNN model results:

Accuracy:   0.7411148092888021
Recall:     0.7411148092888021
f1_score:   0.741477630295568
Precision:  0.7972578281551425
ROC AUC score:  0.8291519390873785

Models' confusion matrices:

1. RF model
[[ 8753    87   494  5183    84]
 [  344   449    26   578     1]
 [ 1429    33  1311  5504    40]
 [ 1431   104   668 18072    26]
 [  350     0    11   515    28]]

MLP model:

[[11106   574   677  1698   546]
 [  904   172   106   180    36]
 [ 4897   657   530  2133   100]
 [ 7668  2448  1532  8301   352]
 [  490    36    33   319    26]]

CNN model:

[[6195   28  137  226   52]
 [ 108  789   39   16    6]
 [  95    5 3113  376   10]
 [2506  326 2398 8570  238]
 [  72   10   73   46  705]]

In all cases, accuracy=recall! How can this be possible?

EDIT

Metrics calculation:

1. RF model:
pred_test = model.predict(x_test)
test_acc = sklearn.metrics.accuracy_score(y_test, pred_test)
f1 = sklearn.metrics.f1_score(y_test, pred_test, average='weighted')
recall = sklearn.metrics.recall_score(y_test, pred_test, average='weighted')
precision = sklearn.metrics.precision_score(y_test, pred_test, average='weighted')
pred_prob = model.predict_proba(x_test)
roc = roc_auc_score(y_test, pred_prob, average='weighted', 
    multi_class='ovr',labels=[0,1,2,3,4])

MLP

accuracy = sklearn.metrics.accuracy_score(y_test, y_pred)
f1 = sklearn.metrics.f1_score(y_test, y_pred, average='weighted')
recall = sklearn.metrics.recall_score(y_test, y_pred, average='weighted')
precision = sklearn.metrics.precision_score(y_test, y_pred, average='weighted')

CNN

Pred = model.predict(x_test, batch_size=32)
Pred_Label = np.argmax(Pred, axis=1)
labels=[0, 1, 2, 3, 4]
...
ConfusionM = confusion_matrix(list(y_test_ori), Pred_Label, labels=labels)
class_report = classification_report(list(y_test_ori), Pred_Label, labels=labels)
roc = roc_auc_score(y_test_ori, Pred, average='weighted', 
    multi_class='ovr',labels=labels)
print(f" ROC score: {roc}")

I don't know what is happening here precisely, but precision and recall don't make a lot of sense for multi-class predictions. ... — Stephan Kolassa, May 11 '21 at 10:45
... Actually, none of these are useful at all, even for binary classification: Why is accuracy not the best measure for assessing classification models? The same problems apply to sensitivity, specificity, recall, precision, and indeed to all evaluation metrics that rely on hard classifications. Instead, use probabilistic classifications, and evaluate these using proper scoring rules. — Stephan Kolassa, May 11 '21 at 10:45
@StephanKolassa "none of these are useful at all, even for binary classification". Can you please elaborate what you mean ? — super_ask, May 11 '21 at 10:50
Have you looked at the links I included? Here is my answer to the first question linked. — Stephan Kolassa, May 11 '21 at 10:51

Tim · Accepted Answer · 2021-05-11T15:41:25.203

In this blog post you can find a review of those metrics, it also mentions the weighted metrics that you use. If you look closely, same as in your case, accuracy and weighted recall are equal in their example. They would always be equal by definition as you will see below.

Let me use the data example from the blog post linked above.

import numpy as np
from sklearn import metrics
Constants
C="Cat"
F="Fish"
H="Hen"
True values
y_true = [C,C,C,C,C,C, F,F,F,F,F,F,F,F,F,F, H,H,H,H,H,H,H,H,H]
Predicted values
y_pred = [C,C,C,C,H,F, C,C,C,C,C,C,H,H,F,F, C,C,C,H,H,H,H,H,H]
C = metrics.confusion_matrix(y_true, y_pred)
print(C)
print(metrics.classification_report(y_true, y_pred, digits=3)))

prints the following

[[4 1 1]
 [6 2 2]
 [3 0 6]]
              precision    recall  f1-score   support
     Cat      0.308     0.667     0.421         6
    Fish      0.667     0.200     0.308        10
     Hen      0.667     0.667     0.667         9

accuracy                          0.480        25

macro avg      0.547     0.511     0.465        25
weighted avg      0.581     0.480     0.464        25

Now, let's calculate the quantities by hand. First, notice that in the confusion matrix C the true labels are in rows and predicted ones in columns. Accuracy is simple, we have the true positive counts in diagonal, so we divide them by the total number of samples:

np.sum(np.diag(C)) / np.sum(C)
# 0.48

Recall that recall is defined as the ratio between true positives and the true labels (size of the class), i.e.

np.diag(C) / np.sum(C, axis=1)
# array([0.66666667, 0.2       , 0.66666667])

If you look at the scikit-learn's documentation, weighted recall uses a weighted average weighting the per class recall scores by the size of the class i.e. you calculate something like this

np.sum(np.diag(C) / np.sum(C, axis=1) * np.sum(C, axis=1)) / np.sum(C, axis=1).sum()
# 0.48

Did you notice something fancy about the calculation? There's this part / np.sum(C, axis=1) * np.sum(C, axis=1) that cancels out. We divide by the class size to calculate recall per class, then we multiply by the class size to weigh the results. Also np.sum(C, axis=1).sum() can be reduced to np.sum(C), so we can simplify and rewrite the whole thing to

np.sum(np.diag(C)) / np.sum(C)
# 0.48

Does it look familiar? This is accuracy.

TL;DR as mentioned in the linked blog post, using the micro-F1, micro-precision, and micro-recall does not make much sense, since they are equal to accuracy. The same applies to weighting recall by the class size, as it is just an unnecessary complicated way of calculating the accuracy.

Thank you so much for this detailed analysis. It really helps alot. — super_ask, May 11 '21 at 14:24

Accuracy always equal to recall

1 Answers1

Constants

True values

Predicted values

Linked