I am trying to predict a binary outcome. My sample size is very small (n=160) and has a high-class imbalance (80:20). All the variables are highly correlated, and the dataset is high dimensional (the number of variables is 96, and the minority class has 32 samples only).
- Can I only use repeated or nested cross-validation instead of using a held-out test set (20% of data) for the final evaluation?
- Or should I use cross-validation for hyper-parameter optimization only and then do the final testing on the held-out test set?
- What feature selection methods are appropriate for high-correlated, high-dimensional data?