4

Suppose we're given data from three different classes which are normally distributed with the following means and variances:

$C_1: \mu_1=(1,2)^T, \Sigma_1^{-1}=( \begin{array}{ccc}2 & 1 \\1 & 2 \end{array})$
$C_2: \mu_2=(2,-2)^T, \Sigma_2^{-1}=( \begin{array}{ccc}1 & 0 \\0 & 2 \end{array})$
$C_3: \mu_3=(1,-1)^T, \Sigma_3^{-1}=( \begin{array}{ccc}7 & 5 \\5 & 6 \end{array})$

And the loss function of those three classes is $L=\left (\begin{array}{ccc}0 & 1 &3 \\2 & 0 & 2\\ 4&3&0\end{array}\right)$

How should criterion be modeled to make optimal decisions for such a problem? And to which class the point $(0,0)^T$ must be assigned?

I think I can solve the problem without considering the loss function (from prior and posterior probabilities) but I have no idea what can be done about the loss function and consider it in making optimal decisions.

Gigili
  • 845

1 Answers1

3

You need the loss function because it tells you how to penalize each misclassification. What you want to do is compute the expected Bayesian risk: $r(a, \pi) = \int_{\theta} L(\theta, a)\pi(\theta|x)d\theta$, where L() is the loss function and $\pi()$ is the posterior distribution. The loss function is indexed by the action, $a$, this is how you classify the data point, and $\theta$, the actual class of the object. This tells us what our expected penalty is for taking action $a$ when our posterior beliefs are $\pi$.

For any data point, you want to assign the data to the class in which the risk is minimized. There are a few resources online that I've found pretty useful by searching for "optimal Bayesian classification" or "bayesian risk" (e.g. these course notes).

Nick
  • 3,537
  • I don't understand how to use the formula you suggested. I have calculated $P(x|C_i)$, but what's $L(x,a)$?Or more importantly, what's $a$ here? – Gigili Jan 09 '13 at 06:17
  • 'a' is the action you take. In this case it's making a decision to classify a point (say (0,0)) to one of your three classes. – Pat Jan 09 '13 at 09:39