I used the ROSE package in R to balance a dataset. I wasn't sure which would yield better results so I split my data into training and test sets (75/25) then over and undersampled my sample before running a logit model. My AUC is the same for the over, under and non-sampled models and I'm not able to rationalize why.
Some details about my sample if it is helpful:
- I have 3861 observations from BRFSS
- 84% are class 0, which is why I wanted to try over and under sampling