This is a common occurrence in machine learning and exactly the issue of unbalanced-classes that gets discussed on here so often. The typical recommendation is to do nothing, as class imbalance is minimally problematic. Even the bounty-earning answer to the linked question discusses a solution to cost minimization through clever experimental design, rather than what to do once you have the data.
I have thought about if a a zero-inflated Bernoulli likelihood might be helpful, but this kind of hierarchy winds up not making sense to me: you can represent the low probability of vategory $1$ through the zero-inflated model, or you can just predict low probabilities and use a model that is not as complicated (can deal with just a Bernoulli variable instead of some zero-inflated valiable, since a binary variable is so simple).
An issue you might encounter is that all or almost all of the outcomes might be predicted to belong to the majority category (typically coded as $0$). This is an artifact of your software imposing an arbitrary threshold (typically a probability of $0.5$) to bin the continuous outputs into discrete categories. As you change the threshold, you can have predictions ranging from all predicted as category $0$ to all predicted as category $1$. However, this means that you no longer evaluate the original but, instead, the models in conjunction with a decision rule that maps predictions below a threshold to one category and predictions above the threshold to the other.