1

In book "Temporal Data Mining" by T.Mitsa, the problem given to illustrate Bayesian classification consists in the following training set of medical records:


age              hypertension   diabete        class

<70   (cat.A)    yes (cat.D0)   yes (cat.E)    C1
70-80 (cat.B)    no  (cat.D)    yes            C3
>80   (cat.C)    yes            yes            C2
<70              yes            no  (cat.F)    C3
70-80            yes            no             C3
>80              no             yes            C1
70-80            no             yes            C2
70-80            no             no             C3
>80              no             yes            C1
>80              no             no             C3
70-80            yes            no             C3
<70              no             yes            C3

The question is:
"What is the likelihood that the following patient do not have a TIA attack (i.e: belongs to class C3)?
S = ( age=72, no hyp., no diab.) = (B,D,F) "

We have:
$p(BDF/C3).p(C3) = p(B/C3).p(D/C3).p(F/C3).p(C3) = 4/7 * 4/7 * 5/7 * 7/12 = .136$
and since:
$p(F/C2) = p(B/C1) = 0$
then:
$p(BDF/C2) = p(BDF/C1) = 0$
So from the training set we can classify S in C3 because:
$p(BDF/C3).p(C3)>p(BDF/C1).p(C1)$
and:
$p(BDF/C3).p(C3)>p(BDF/C2).p(C2)$.

Fine. The problem stops here in the book.
But what is the answer to the original question, which I believe is the value of the posterior probability $p(C3/BDF)$?

$p(C3/BDF) = p(BDF/C3).p(C3)/p(BDF)$

We need the value of the marginal probability $p(BDF)$:
a premise is that features are independent, so I would (naively) write:

$p(BDF) = p(B).p(D).p(F) = 5/12 * 7/12 * 5/12 = .101$ $p(C3/BDF) = .136/.101 = 1.35$

Ooops! What's wrong?
Optional question: what would be a good indicator for the certainty of the classification?

  • the answer is probably already here: https://stats.stackexchange.com/a/66090/175696. B,D,F are not really independent in that sense, so the denominator is not equal to p(B).p(D).p(F). – Francois Sep 03 '17 at 19:44
  • ... still, is there a way to evaluate p(BDF) and p(C3/BDF) ? – Francois Sep 07 '17 at 15:21
  • 1
    Does it make sense that you get the following? P(BDF&C3) = .136 > .101 = P(BDF) – Sextus Empiricus Sep 12 '17 at 12:13
  • 1
    Your features B, D and F are not independent. They may be (theoretically) in the population. But they are clearly not in your selected sample of 12. – Sextus Empiricus Sep 12 '17 at 12:21
  • So if I understand correctly we cannot estimate p(BDF) from the samples (maybe if we had a large enough number of them?), and indeed p(BDF)<p(BDF&C3) does not make sense. Thanks – Francois Sep 14 '17 at 11:44

2 Answers2

3

In your case you do not have $p(i \vert j) = p(i)$ with $i$ and $j$ among $\lbrace B,D,F \rbrace$ and $i \neq j$. So the probabilities are not independent (at least for this sample), as

$p(ij) = p(i \vert j)p(j) \neq p(i)p(j) $ .

You should use instead:

$p(C3 \vert B,D,F) = \frac{p(B,D,F\vert C3)}{p(B,D,F)}p(C3) = \frac{1/7}{1/12} 7/12 = 1$

which makes sense. If you look at all cases of BDF then all of them have C3, so $p=1$.


note: it is strange to say 'what is the likelihood that the following patient....' and it would be better to replace likelihood with probability.

$\mathcal{L}(C3,BDF) = p(BDF \vert C3) = 1/7$

1

Did you multiply by $p(C3)$ in your first line $p(BDF/C3)$?

I get $p(BDF/C3) = 4/7 * 4/7 * 5/7 * 7/12 = 0.0793$

Therefore $p(C3/BDF) = 0.785$

Dale C
  • 1,405
  • nope, there was a mistake in the question but the result is still 445*7/7/7/7/12 = 20/3/7/7 = 0.136 – Francois Sep 08 '17 at 13:09