Good probability models seek out the true probability values and find probability values that reflect the reality of true event occurrence (the predictions are “calibrated”). This is true whether the categories are balanced or not.
When you have considerable imbalance, you are telling the model to be skeptical of membership in the minority category. This makes sense. In the absence of highly compelling evidence of membership in the minority class, it’s probably the case that the observation belongs to the majority class. Consequently, your model might not ever make a prediction above a probability of $0.4$. However, if almost no one responds to the advertisement, think about how much of a win it is to get a prediction like that. Instead of the probability of that individual responding being the proverbial one-in-a-million, the probability is better than one-in-three. Sure, the more likely outcome is that this individual will not respond, but such an individual is so much more likely to respond than usual that it might be worth advertising.
If you have a very small number of minority-category observations, you might have some issues. The King and Zeng paper mentioned in the linked answer addresses sampling techniques to be efficient in collecting members of the minority category and then make corrections later. If you already have data, their ideas do not really apply. If you already have data and find yourself having many members of the minority category despite the imbalance, whatever estimation issues there are when the minority category has a small size have likely been overcome, meaning that techniques like ROSE and SMOTE introduce additional sources of error to fix an issue that has already been fixed.
If “care about edge cases” means that you want to make a point not to miss out on sending an advertisement to someone who is particularly likely to respond, it might be the case that sending an ad to everyone is the best approach. Given what my email inbox looks like, this appears to be a real way for marketing people to approach the problem, perhaps with a reasonable amount of success. If being this extreme is not viable, then accurate probabilities can guide you to good decisions about how likely someone is to respond to an advertisement and if it is worth the cost of sending them the ad. You will get these accurate probabilities by modeling the true amount of skepticism to have about probability of response, not by tricking your model outputs into being artificially high by balancing the categories.
This is not about being 99.9% correct. Let's do it on an example - right now 1% of people buy product X naturally. You want to send emails to 10000 people to buy it, but instead of doing it randomly, you want to use the model to target the customers who are more similar to those who already bought it. Hence making the model. So 10000 of highest scored people will get the email - the question is does sampling affect the quality of the model in this case.
– BloodthirstyPlatypus Aug 17 '23 at 13:32