1

Please refer to this literature:

According to Naive Bayes classification algorithm:

$P(sneezing,builder|flu) = P(sneezing|flu)P(builder|flu) $

where sneezing and builder are independent events.

How do they arrive at the above conclusion mathematically?

Is it something like:

$P(sneezing,builder|flu)$ $=P(sneezing \cap builder| flu)$ $= \frac{P((sneezing \cap builder) \cap flu)}{P(flu)}$

Soumee
  • 113
  • 4

1 Answers1

3

This is just a basic property of conditional independence. If two events A and B are conditionally independent, given event C then:

$$Pr(A \ and\ B \ |\ C) = Pr(A \ |\ C) * Pr(B \ | \ C)$$

or equivalently,

$$Pr(A \ | \ B \ and\ C) = Pr(A \ |\ C).$$

That is,

$$ Pr(A \ and \ B | \ C) $$ $$ =\frac{Pr(A \ and \ B \ and \ C)}{Pr(C)} $$

(definition of conditional probability)

$$= \frac{Pr(A \ | B \ and\ C) * Pr (B \ and \ C)}{Pr(C)}$$ (by Bayes rule) $$= Pr(A \ | \ C) * Pr(B\ | \ C) $$

(independence and definition of conditional probability again)

The independence assumption is what makes Naive Bayes "naive" because it is a bold assumption in general to think your variables are all conditionally independent, given a class. In Naive Bayes, we seek to find an expression for:

$$Pr(class \ r \ | x_1, x_2, x_3..., x_k)$$ where $x_1, x_2, x_3, ... x_k$ are the features/variables/predictors that we have. Applying Bayes rule;

$$Pr(class \ r\ | x_1, x_2,...,x_k) = \frac{Pr(x_1, x_2, ...x_k\ | class \ r) * Pr(class \ r)}{\sum_{i=1}^m Pr(x_1, x_2, ...x_k\ |\ class \ i) Pr(class \ i )}$$

where m = number of classes.

The expression $Pr(x_1, x_2, ...x_k\ | class \ r)$, the conditional joint distribution of the predictors, is essentially impossible to estimate directly. So we make the assumption of conditional independence in the predictors, given a class (and that the distribution of $(x \ | \ class \ i)$ follows Gaussian, typically, but not always)

$$Pr(x_1, x_2, ...x_k\ | class \ r) = \prod_{j = 1}^{k}Pr(x_j \ | \ class \ r)$$

Hence:

$$Pr(class \ r\ | x_1, x_2,...,x_k) = \frac{\prod_{j = 1}^{k}Pr( \ x_j | \ class \ r) * Pr(class \ r)}{\sum_{i=1}^m \left[\prod_{j = 1}^{k}Pr(x_j | class \ i) \right]Pr(class \ i )}$$.

We typically estimate the prior probabilities, $Pr(class \ i )$, using the MLE = sample proportion; # of training examples in class i / # of total training samples.

aranglol
  • 2,196
  • 7
  • 14