0

I am going through a Machine Learning tutorial, where the problem statement is expressed using Bayes theorem. Particularly,

$P(y|Data) = \frac{P(Data|y) P(y)}{\int P(Data)}$

The data part is particularly confusing to me as is meant to be a normalizing constant, however, it does not depend on $y$ from the above definition. Also I'm not sure what is being integrated here and over what. For example, if $Data$ comprises several random variables, say $Data = \{x_1,x_2,...,x_d\}$, now the above expression becomes, $P(y|x_1,x_2,...,x_d) = \frac{P(x_1,x_2,...,x_d)P(y)}{\idotsint P(x_1,x_2...,x_d) dx_1 \dots dx_d}$ if I am not mistaken. However, how does this act as a normalizing constant, i.e. resulting to a valid probability distribution?

I'm certain that I have something wrong here, so could someone please shed some light on this?

  • 1
    What is the $y$? – Zhanxiong Feb 18 '23 at 13:00
  • Basic comment: Bayes' Theorem is a (math) theorem that holds for machine learning as well! – Xi'an Feb 18 '23 at 13:26
  • 1
    Typo? The denominator should either be$$P(Data)$$ or $$\int P(Data|y) P(y)\text dy$$ – Xi'an Feb 18 '23 at 13:50
  • The aim is to have the left hand side be a conditional probability (or density) for $y$. This requires its sum (or integral) over all possible values of $y$ to be $1$. There is no reason to expect $P(Data)=\sum , P(Data\mid y)P(y)$ $(\text{or } \int P(Data\mid y)P(y) , dy)$ to be $1$, so the denominator takes care of this. – Henry Feb 19 '23 at 00:23

0 Answers0