(This started as a comment)
Regarding some good threads already available. I would strongly suggest looking into the threads:
- Are unbalanced datasets problematic, and (how) does oversampling (purport to) help?
- When is unbalanced data really a problem in Machine Learning?
- What problem does oversampling, undersampling, and SMOTE solve?
They give a very good idea about the sublimity of the imbalance learning problem. They should help built a better appreciation of the issue because reading bite-sized cook-book suggestions (like the one I will do below) is only a stop-gap measure.
Regarding the calibration of prediction:
If the observed class proportions before re-sampling is say 0.5-to-99.5 and we do a 1% negative downsampling, the observed class proportions in our new sample will become now reflect approximately a 34-to-66 proportion. This is our "downsampled space" where we train the learner. We need to re-calibrate our learner for actual deployment so we get back the 0.5% prediction; that is because in our original space, a 34-to-66 proportion would lead to unreasonably high predicted probabilities. A straightforward way would be to calculate the new probabilities as $q = \frac{p}{p + \frac{1-p}{w}}$ where $p$ is the prediction in downsampled space and
$w$ is the the negative downsampling rate. So for example if we predicted $p = 0.5$ in the example above, the actual probability should be more like $q = 0.009901 = (\text{because: } \frac{0.5}{0.5 + 0.5/0.01})$.
Two good first references on the matter are: Dal Pozzolo et al. (2015) Calibrating Probability with Undersampling for Unbalanced Classification and Elkan (2002) The foundations of cost-sensitive learning. (The formula I wrote above is effectively Eq. 3 from Dal Pozzolo's paper.)
Just to be clear: in any classification problem it is far better to focus on assigning costs for misclassification rather than keep hammering about metrics like AUC-ROC, AUC-PR, Cohen's $\kappa$ and the likes. As a real life example: A screening tool and a diagnostic tool serve different purposes so evaluating their utility based on the same metric is probably an oversimplification.
imbalance is not going to be an issue? It performs about the same without oversampling – slaw Dec 06 '18 at 01:11