I want to train a binary classifier while only few examples in train data are "T" class. However I just used two numeric features but I think It's not logical to use any kind of classifier for this task. I have tried Naive Bayes and logistic regression. In my train data there's 3388 examples which 3110 of them are "F" class.
1 Answers
Yes, this is completely reasonable. Remember that many models, such as the logistic regression you've mentioned, return probability values. Your model might tell you that the probability of the minority class is low, but there are so few instances that it's always going to be the case that such a class is unlikely unless the evidence in its favor is overwhelming. Depending on the costs of misclassifying the various groups (calling an actual $T$ an $F$ vs calling an actual $F$ a $T$), it might be that classify every case as one class, simply because the minority class is never likely enough to justify the possibility of that kind of mistake.
I will refer you to some of the usual links I post about class imbalance and models that output probability values.
Are unbalanced datasets problematic, and (how) does oversampling (purport to) help?
- 62,186
class imbalanceand/ormisclassification costs. That will solve your problem. – user1737564 Aug 09 '16 at 00:20