I'm training an Elastic Net model on a small dataset with about 100 TRUE outcomes and 15 FALSE outcomes. I've been using AUC to compare models but I'm worried this metric is unstable because some bootstrapped subsamples only have 4 FALSE outcomes in the test set. Is there another metric that would be more appropriate here?
Edit: My Elastic Net models returns numerical predictions