In book "Temporal Data Mining" by T.Mitsa, the problem given to illustrate Bayesian classification consists in the following training set of medical records:
age hypertension diabete class
<70 (cat.A) yes (cat.D0) yes (cat.E) C1
70-80 (cat.B) no (cat.D) yes C3
>80 (cat.C) yes yes C2
<70 yes no (cat.F) C3
70-80 yes no C3
>80 no yes C1
70-80 no yes C2
70-80 no no C3
>80 no yes C1
>80 no no C3
70-80 yes no C3
<70 no yes C3
The question is:
"What is the likelihood that the following patient do not have a TIA attack (i.e: belongs to class C3)?
S = ( age=72, no hyp., no diab.) = (B,D,F) "
We have:
$p(BDF/C3).p(C3) = p(B/C3).p(D/C3).p(F/C3).p(C3) = 4/7 * 4/7 * 5/7 * 7/12 = .136$
and since:
$p(F/C2) = p(B/C1) = 0$
then:
$p(BDF/C2) = p(BDF/C1) = 0$
So from the training set we can classify S in C3 because:
$p(BDF/C3).p(C3)>p(BDF/C1).p(C1)$
and:
$p(BDF/C3).p(C3)>p(BDF/C2).p(C2)$.
Fine. The problem stops here in the book.
But what is the answer to the original question, which I believe is the value of the posterior probability $p(C3/BDF)$?
$p(C3/BDF) = p(BDF/C3).p(C3)/p(BDF)$
We need the value of the marginal probability $p(BDF)$:
a premise is that features are independent, so I would (naively) write:
$p(BDF) = p(B).p(D).p(F) = 5/12 * 7/12 * 5/12 = .101$ $p(C3/BDF) = .136/.101 = 1.35$
Ooops! What's wrong?
Optional question: what would be a good indicator for the certainty of the classification?