1

I have seen a lot of different advice on how to deal with imbalance, and I get that it can be case-specific. But I learned in school that SMOTE oversampling or undersampling were basically the ways to fix this, and now in the real world these methods seem to be introducing a lot of problems with bias and uninterpretable probability. I have done a lot of research and also looked at weighting the model, or just changing the threshold downstream. Every time I think I have a good solution to my classification problem (right now trying to find a model for classifiying interested leads for sales when the sales are 12x lower than non-sales), I do more research and run into someone on a blog insisting that this is the wrong way - is there a right and a wrong? I have a large dataset and when I do nothing, the accuracy is great..but it predicts almost all non-sales, of course.

Siri C
  • 11

0 Answers0