How do I determine Matthews' Correlation Coefficient for leave-one-out cross-validation? There doesn't seem to be a way to average the coefficient across folds - one element is insufficient support for calculating MCC.
-
Look at the math. Lay out under what situations such a function would be defined. – Galen Dec 17 '23 at 04:15
1 Answers
If you have $N$ total observations, you’ll fit $N$ models to perform the leave-one-one cross validation (LOOCV). Each of these will produce one predicted category.
After running the LOOCV, you will wind up with an $N$-vector of predictions. To get the Matthews correlation coefficient, calculate the Pearson correlation between this $N$-vector of predictions and the $N$-vector of true values, where you have encoded the predictions and true values as numbers (likely $0$ and $1$ or $\pm 1$).
It might help to look at a code example.
set.seed(2023)
N <- 1000
x <- rt(N, 1)
y <- rbinom(N, 1, 0.5)
preds <- rep(NA, N)
for (i in 1:N){
xi <- x[-i]
yi <- y[-i]
L <- glm(yi ~ xi, family = binomial)
preds[i] <- predict(L, data.frame(
xi = x[i],
type = "link"
))
}
predicted_categories <- round(preds)
mcc <- cor(y, predicted_categories)
mcc
As the feature x is independent of the outcome y, the MCC winds up being rather low, around $-0.04$, not statistically significantly different from zero ($p \approx 0.2$ in cor.test).
Note that MCC suffers from issues similar to accuracy, sensitivity, specificity, precision, recall, $F_1$ score, and the $F_{\beta}$ score, and this is true whether imbalance is present or not.
- 62,186