2

I have a two class prediction problem where in one class I have 70% of the samples and in the other class 30% of the samples, so class imbalance. I'm conducting 10-fold cross-validation. To calcualte the AUC there are two possibilities:

  1. Calculating the AUC for each of the 10-folds and then averaging the scores.
  2. Stacking all predictions of all folds and then calculating the AUC (this works because the each data point appears only once in a test fold)

For the second options I'm getting much higher AUC scores (around 0.1 higher). Is there a reason why the second one produces higher scores or which one should be prefered?

DictionaryProver
  • 299
  • 4
  • 11

1 Answers1

3

I am guessing that the AUC is low for one or two folds and this brings the average down. The example below shows how low discrimination in one fold can generate a difference between 1. and 2. like you are seeing.

#generate binary data correlated to x
x <- rnorm(1000,0,3)
p <- exp(x)/(exp(x)+1)
y <- rbinom(1000, 1, p)

#proportion of cases
mean(y)
#[1] 0.507

#plot(x,y)

#fit glm
fit <- glm(y~x, family=binomial(link = "logit"))
pred <- predict(fit, type = "response")
ord <- order(-pred)

#bookkeeping
idx <- rep(1:5, each=200)
l.rocs <- list()
l.aucs <- list()

#naive prediction for one fold
pred[idx==1] = mean(y)

#calculate ROC for folds
for(i in 1:5){
  pred.s <- pred[idx==i]
  y.s <- y[idx==i]
  l.rocs[[i]] <- roc(response=y.s, predictor = pred.s)
  l.aucs[[i]] <- l.rocs[[i]]$auc
}

The difference shown below (~0.03) is less than what you report.

#method 1
mean(unlist(l.aucs))
#[1] 0.8389421

#method 2
roc.all <- roc(response=y, pred)
roc.all$auc
#Area under the curve: 0.8772

I recommend plotting ROCs like below to compare. The first 5 plots are individual folds, the upper-left is random guessing and the bottom right is method 2.

enter image description here

HStamper
  • 1,481