I have one group with n=13 and another with n=26 (proportion 1:2).
I have 14 features to use in the classification model.
I am using a logistic regression model.
My questions are:
- Is it correct to use a logistic regression with such imbalanced groups?
- I know that I have too many features for the number of observations. I read somewhere that I should have 10 or 20 observations for each feature in my model, so I used the leave-one-out cross-validation to select a subset. I organized everything to run every possible combination of features using the leave-one-out CV and found a trade-off of 4 features with 0.69 accuracy. Despite the result not being so exciting, is this approach correct?
- Should the final log reg coefficients be estimated as an average of the cross validation coefficients, or should I use the entire dataset to estimate them? (I read somewhere else that I should go with the latter.)
Any help is greatly appreciated!
I am doing prediction/classification.
From the links, it seems that there is NO imbalance problem in logistic regression? In this post they suggest that the intercept could be poorly estimated in such imbalaced cases and cause problems when predicting. What do you think?
I agree that accuracy was a poor choice to describe the problem. I also get sens., specif. and AUC. – rfc Nov 24 '22 at 15:43