I'm trying to train a text classification model that can predict label $A$ and $B$ accurately. However, 95% of the text examples in my dataset, which is representative of the kind of data I want my model to predict on, are not examples of $A$ nor $B$. I instead labeled them as $O$ for other.
I need more examples for my labeled dataset to actually train a good model, but to do so, I need to make my labeled dataset not as representative as the data it's meant to be used on. So, there's a catch-22 situation here. I can either label forever until I'm happy with the number of labels $A$ and $B$ that I have, which will feed my model $100$ examples of $O$ for every $5$ or $6$ examples of $A$ or $B$ and probably make the model favor assuming all text examples are labeled $O$ (although this may not be true as the proportion of populations for $A$ and $B$ vs. $O$ shouldn't change as I continue to label), or I give it a non-representative dataset that artificially has a greater proportion of occurrences in $A$ and $B$.
How do I navigate this? I don't know what imperfect solution is better.