0

Let's say we have a classifier, which we trained on some balanced data, and during the inference our data is imbalanced. Is there a way to leverage the information about the distribution of the imbalanced data during the prediction?

To be more precise we have the following components:

$P(C|X,B)$ - model; $P(C|B)=\frac{1}{|C|}$; $P(C)$ - dist. of classes, where

$C$ - classes; $X$ - features; $B$ - data is balnaced

What we are looking for is the distribution $P(C|X)$.

I was trying to juggle the chain rule and the Bayes theorem to obtain $P(C|X)$ from the three components above somehow, but I failed.

So the question is: Is it possible to express $P(C|X)$ in terms of $P(C|X,B)$, $P(C|B)$ and $P(C)$? If it is not straightforward, what additionally we would need to have to be able to calculate $P(C|X)$?

  • 1
    We have a post about that somewhere, so the answer is yes. However, why did you train on balanced data when that is not the case? (There are legitimate reasons for doing so, but it is common for people do do it when their reasons are not so good, too.) – Dave Nov 10 '22 at 17:56
  • @Dave I would be grateful if you would dig up that post somehow. We balanced the data during the training because it was extremely imbalanced. – Fallen Apart Nov 10 '22 at 18:03
  • 1
    You might just have to deal with correcting the balancing, but in many cases, no balancing is needed, which could be worth keeping in mind next time. // What problems did the balancing solve? – Dave Nov 10 '22 at 18:07
  • 1
    So you “undone” imbalanced data by balancing it and now you want to un-undo imbalanced data by transforming the predictions to be “imbalanced”? – Tim Nov 10 '22 at 18:16

0 Answers0