Matthews' Correlation Coefficient and Leave-One-Out Cross Validation

Question

How do I determine Matthews' Correlation Coefficient for leave-one-out cross-validation? There doesn't seem to be a way to average the coefficient across folds - one element is insufficient support for calculating MCC.

Look at the math. Lay out under what situations such a function would be defined. — Galen, Dec 17 '23 at 04:15

Dave · Accepted Answer · 2023-12-17T13:35:33.273

If you have $N$ total observations, you’ll fit $N$ models to perform the leave-one-one cross validation (LOOCV). Each of these will produce one predicted category.

After running the LOOCV, you will wind up with an $N$-vector of predictions. To get the Matthews correlation coefficient, calculate the Pearson correlation between this $N$-vector of predictions and the $N$-vector of true values, where you have encoded the predictions and true values as numbers (likely $0$ and $1$ or $\pm 1$).

It might help to look at a code example.

set.seed(2023)
N <- 1000
x <- rt(N, 1)
y <- rbinom(N, 1, 0.5)
preds <- rep(NA, N)
for (i in 1:N){
xi &lt;- x[-i]
yi &lt;- y[-i]

L &lt;- glm(yi ~ xi, family = binomial)
preds[i] &lt;- predict(L, data.frame(
    xi = x[i],
    type = &quot;link&quot;
))

}
predicted_categories <- round(preds)
mcc <- cor(y, predicted_categories)
mcc

As the feature x is independent of the outcome y, the MCC winds up being rather low, around $-0.04$, not statistically significantly different from zero ($p \approx 0.2$ in cor.test).

Note that MCC suffers from issues similar to accuracy, sensitivity, specificity, precision, recall, $F_1$ score, and the $F_{\beta}$ score, and this is true whether imbalance is present or not.

Matthews' Correlation Coefficient and Leave-One-Out Cross Validation

1 Answers1