0

How do I determine Matthews' Correlation Coefficient for leave-one-out cross-validation? There doesn't seem to be a way to average the coefficient across folds - one element is insufficient support for calculating MCC.

Nick Cox
  • 56,404
  • 8
  • 127
  • 185

1 Answers1

2

If you have $N$ total observations, you’ll fit $N$ models to perform the leave-one-one cross validation (LOOCV). Each of these will produce one predicted category.

After running the LOOCV, you will wind up with an $N$-vector of predictions. To get the Matthews correlation coefficient, calculate the Pearson correlation between this $N$-vector of predictions and the $N$-vector of true values, where you have encoded the predictions and true values as numbers (likely $0$ and $1$ or $\pm 1$).

It might help to look at a code example.

set.seed(2023)
N <- 1000
x <- rt(N, 1)
y <- rbinom(N, 1, 0.5)
preds <- rep(NA, N)
for (i in 1:N){
xi &lt;- x[-i]
yi &lt;- y[-i]

L &lt;- glm(yi ~ xi, family = binomial)
preds[i] &lt;- predict(L, data.frame(
    xi = x[i],
    type = &quot;link&quot;
))

} predicted_categories <- round(preds) mcc <- cor(y, predicted_categories) mcc

As the feature x is independent of the outcome y, the MCC winds up being rather low, around $-0.04$, not statistically significantly different from zero ($p \approx 0.2$ in cor.test).

Note that MCC suffers from issues similar to accuracy, sensitivity, specificity, precision, recall, $F_1$ score, and the $F_{\beta}$ score, and this is true whether imbalance is present or not.

Dave
  • 62,186