2

In Introduction to Statistical Learning the function given for LDA classification (for more than one predictors) is: $$\delta_k(x) = x^T\Sigma ^{-1}\mu_k-\frac{1}{2}\mu^T_k\Sigma^{-1}\mu_k+log\pi_k$$ Now, the prosterior probability is given by : $$p_k(x) = \frac{\pi_kf_k(x)}{\sum_{l=1}^K\pi_lf_l(x)}$$Where $f_k(x)$ is the density function, that is equal to $Pr(X=x|Y=k)$ and $\pi_k$ is the fraction of the training observations that belong to the kth class.(There are a total of $K$ classes)

Now, in LDA, $f_k(x)$ is assumed to be distributed normally. Therefore:
$$f_k(x)=\frac{1}{(2\pi)^{p/2}|\Sigma|^{1/2}}exp(-\frac{1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu))$$ (for more than one predictors)
Therefore $p_k(x)$ is:
$$p_k(x) = \frac{\pi_k\frac{1}{(2\pi_k)^{p/2}|\Sigma|^{1/2}}exp(-\frac{1}{2}(x-\mu_k)^T\Sigma^{-1}(x-\mu_k))}{\sum_{l=1}^K\pi_l\frac{1}{(2\pi_l)^{p/2}|\Sigma|^{1/2}}exp(-\frac{1}{2}(x-\mu_l)^T\Sigma^{-1}(x-\mu_l))}$$ Now, cancelling the denominators of both the numerator and denominator of this expression, and knowing that the denominator is constant(thus removing it). And, to find $\delta_k(x)$ we do the log of both sides and remove all the constant terms, and the terms that don't contain $\mu_k$ or $\pi_k$$^1$. Then we'll have : $$\delta_k(x) = log \pi_k-\frac{1}{2}(x-\mu_k)^T\Sigma^{-1}(x-\mu_k) \\=log\pi_k - \frac{1}{2}(x^T\sigma^{-1}(x-\mu_k))+\frac{1}{2}\mu_k^T\sigma^{-1}(x-\mu_k) \\= log\pi_k-\frac{1}{2}x^T\Sigma^{-1}x+\frac{1}{2}x^T\Sigma^{-1}\mu_k+\frac{1}{2}\mu_k^T\Sigma^{-1}x-\frac{1}{2}\mu_k^T\Sigma^{-1}\mu_k$$

Here we can remove the terms that don't contain $\mu_k$ and $\pi_k$, thus $$\delta_k(x) = log\pi_k+\frac{1}{2}x^T\Sigma^{-1}\mu_k+\frac{1}{2}\mu_k^T\Sigma^{-1}x-\frac{1}{2}\mu_k^T\Sigma^{-1}\mu_k$$ Which is different from the formula given in the book. Basically the coefficient of the $x^T\Sigma^{-1}\mu_k$ is $\frac{1}{2}$ in my formlula, but not in the original one. And, there's no $\frac{1}{2}x^T\Sigma^{-1}\mu_k$ in the original formula.

So, Am I doing this right? If no, then what is wrong here?


$^1$ confusion in this part. How exactly is $\delta_k(x)$ chosen?

Mooncrater
  • 787
  • 2
  • 9
  • 20
  • It sounds that you are considering just the case with 2 classes, aren't you? If yes please notify it in the text. – ttnphns Jul 31 '17 at 11:13
  • No, @ttnphns . There are a total of $K$ classes, which I have notified now. – Mooncrater Jul 31 '17 at 14:48
  • 1
    Hmm. Are you asking why $\frac{1}{2}a^\top S b + \frac{1}{2}b^\top S a = a^\top S b$ for a symmetric matrix $S$? – amoeba Jul 31 '17 at 14:59
  • @amoeba I think yes. Can I get any link to it's proof? – Mooncrater Jul 31 '17 at 15:39
  • 1
    Sure: $a^\top S b$ is a scalar, hence it's trivially equal to its transpose. Meaning that $a^\top S b = (a^\top S b)^\top = b^\top S^\top a = b^\top S a$ (last equality holds because $S$ is symmetric). – amoeba Jul 31 '17 at 17:22

0 Answers0