I have a two class prediction problem where in one class I have 70% of the samples and in the other class 30% of the samples, so class imbalance. I'm conducting 10-fold cross-validation. To calcualte the AUC there are two possibilities:
- Calculating the AUC for each of the 10-folds and then averaging the scores.
- Stacking all predictions of all folds and then calculating the AUC (this works because the each data point appears only once in a test fold)
For the second options I'm getting much higher AUC scores (around 0.1 higher). Is there a reason why the second one produces higher scores or which one should be prefered?
