is it logical to train a binary classifier when one class form most of examples?

Question

I want to train a binary classifier while only few examples in train data are "T" class. However I just used two numeric features but I think It's not logical to use any kind of classifier for this task. I have tried Naive Bayes and logistic regression. In my train data there's 3388 examples which 3110 of them are "F" class.

This is perfectly reasonable. Many applications have far more extreme class skew. — Marc Claesen, Aug 11 '15 at 15:31
Check out literature on class imbalance and/or misclassification costs. That will solve your problem. — user1737564, Aug 09 '16 at 00:20

score 2 · Accepted Answer · answered Mar 16 '22 at 13:30

Yes, this is completely reasonable. Remember that many models, such as the logistic regression you've mentioned, return probability values. Your model might tell you that the probability of the minority class is low, but there are so few instances that it's always going to be the case that such a class is unlikely unless the evidence in its favor is overwhelming. Depending on the costs of misclassifying the various groups (calling an actual $T$ an $F$ vs calling an actual $F$ a $T$), it might be that classify every case as one class, simply because the minority class is never likely enough to justify the possibility of that kind of mistake.

I will refer you to some of the usual links I post about class imbalance and models that output probability values.

Are unbalanced datasets problematic, and (how) does oversampling (purport to) help?

https://www.fharrell.com/post/classification/

https://www.fharrell.com/post/class-damage/

is it logical to train a binary classifier when one class form most of examples?

1 Answers1