Bayes classifier problem in §3.1.2 of "Temporal Data Mining" by T.Mitsa

Question

In book "Temporal Data Mining" by T.Mitsa, the problem given to illustrate Bayesian classification consists in the following training set of medical records:

age              hypertension   diabete        class

<70   (cat.A)    yes (cat.D0)   yes (cat.E)    C1
70-80 (cat.B)    no  (cat.D)    yes            C3
>80   (cat.C)    yes            yes            C2
<70              yes            no  (cat.F)    C3
70-80            yes            no             C3
>80              no             yes            C1
70-80            no             yes            C2
70-80            no             no             C3
>80              no             yes            C1
>80              no             no             C3
70-80            yes            no             C3
<70              no             yes            C3

The question is:
"What is the likelihood that the following patient do not have a TIA attack (i.e: belongs to class C3)?
S = ( age=72, no hyp., no diab.) = (B,D,F) "

We have:
$p(BDF/C3).p(C3) = p(B/C3).p(D/C3).p(F/C3).p(C3) = 4/7 * 4/7 * 5/7 * 7/12 = .136$
and since:
$p(F/C2) = p(B/C1) = 0$
then:
$p(BDF/C2) = p(BDF/C1) = 0$
So from the training set we can classify S in C3 because:
$p(BDF/C3).p(C3)>p(BDF/C1).p(C1)$
and:
$p(BDF/C3).p(C3)>p(BDF/C2).p(C2)$.

Fine. The problem stops here in the book.
But what is the answer to the original question, which I believe is the value of the posterior probability $p(C3/BDF)$?

$p(C3/BDF) = p(BDF/C3).p(C3)/p(BDF)$

We need the value of the marginal probability $p(BDF)$:
a premise is that features are independent, so I would (naively) write:

$p(BDF) = p(B).p(D).p(F) = 5/12 * 7/12 * 5/12 = .101$ $p(C3/BDF) = .136/.101 = 1.35$

Ooops! What's wrong?
Optional question: what would be a good indicator for the certainty of the classification?

the answer is probably already here: https://stats.stackexchange.com/a/66090/175696. B,D,F are not really independent in that sense, so the denominator is not equal to p(B).p(D).p(F). — Francois, Sep 03 '17 at 19:44
... still, is there a way to evaluate p(BDF) and p(C3/BDF) ? — Francois, Sep 07 '17 at 15:21
Does it make sense that you get the following? P(BDF&C3) = .136 > .101 = P(BDF) — Sextus Empiricus, Sep 12 '17 at 12:13
Your features B, D and F are not independent. They may be (theoretically) in the population. But they are clearly not in your selected sample of 12. — Sextus Empiricus, Sep 12 '17 at 12:21
So if I understand correctly we cannot estimate p(BDF) from the samples (maybe if we had a large enough number of them?), and indeed p(BDF)<p(BDF&C3) does not make sense. Thanks — Francois, Sep 14 '17 at 11:44

Sextus Empiricus · Accepted Answer · 2017-09-14T11:47:47.453

In your case you do not have $p(i \vert j) = p(i)$ with $i$ and $j$ among $\lbrace B,D,F \rbrace$ and $i \neq j$. So the probabilities are not independent (at least for this sample), as

$p(ij) = p(i \vert j)p(j) \neq p(i)p(j) $ .

You should use instead:

$p(C3 \vert B,D,F) = \frac{p(B,D,F\vert C3)}{p(B,D,F)}p(C3) = \frac{1/7}{1/12} 7/12 = 1$

which makes sense. If you look at all cases of BDF then all of them have C3, so $p=1$.

note: it is strange to say 'what is the likelihood that the following patient....' and it would be better to replace likelihood with probability.

$\mathcal{L}(C3,BDF) = p(BDF \vert C3) = 1/7$

score 1 · Answer 2 · answered Sep 07 '17 at 23:13

1

Did you multiply by $p(C3)$ in your first line $p(BDF/C3)$?

I get $p(BDF/C3) = 4/7 * 4/7 * 5/7 * 7/12 = 0.0793$

Therefore $p(C3/BDF) = 0.785$

answered Sep 07 '17 at 23:13

Dale C

1,405

nope, there was a mistake in the question but the result is still 445*7/7/7/7/12 = 20/3/7/7 = 0.136 – Francois Sep 08 '17 at 13:09

Bayes classifier problem in §3.1.2 of "Temporal Data Mining" by T.Mitsa

2 Answers2