I am performing a logistic elastic net regression to assess which variables influence the outcome and evaluate it. I am working with an imbalanced dataset that consists of 50 cases and 1700 controls. My objective is to assess the best approach for model development and evaluation.
At first I performed a traditional train-test split with an 80-20 ratio keeping the same ratio of cases in both the training and test data. However I end up with very few cases in the test sample.
I was wondering if I could split the analysis in two parts:
1- A exploratory model with all the data to visualize the coefficients and know which variables are influencing the outcome and visualize its relative importance.
2- A cross-validation analysis in which I will split and test the data, train and test the model each time and evaluate the performance.
I would like some feedback because I am not entirely sure if there would be any kind of concern if I used the second approach.