1

From a theoretical perspective is the Bayesian Optimal Classifier (BOC) the best possible classifier one can make? Better than NN and GBDT?

Let's say that we have two distributions $P(X,Y)$ and $P(X',Y')$. And we try to use the Bayesian optimal classifier to distinguish between them.

If the BOC has no performance (e.g. AUC=0.5), then can we say that $P(X,Y)\equiv P(X',Y')$?. Or there is the possibility that another learning algorithm has a better performance?

Note: I am not considering multivariate distribution testing.

1 Answers1

1

As you can learn from this answer:

The Bayes classifier is the one that classifies according to the most likely category given the predictor $x$, i.e.,

$$ \text{arg max}_j P(Y = j \mid X = x) . $$

Since these "true" probabilities are essentially never known, the Bayes classifier is more a theoretical concept and not something that you can actually use in practice.

So you are predicting the class based on what is the true probability. Machine learning algorithms make decisions based on their guess of what the probability could be. So it cannot be better in the sense that the machine learning model has no access to the true distribution and at best it can learn it. This is however not to say that the Bayes optimal classifier is a crystal ball: if you know that the biased coin has a 0.6 chance of landing heads, the best guess for the outcome is heads, but it still can by chance land tails.

Tim
  • 138,066