2

First of all I'm studying machine learning with Bishop's pattern recognition I'm stuck with chapter two Gaussian parts. It requires a lot of linear Algebra and Statistics. which books you recommending to form the base for Gaussian part?

and this is my second question about chapter 2(Gaussian)

this equation is from gaussian conditional distribution for quadratic equation

$$-\dfrac{1}{2}({\bf x}-{\pmb \mu})^T\Sigma^{-1}({\bf x}-{\pmb \mu})=-\dfrac{1}{2}{\bf x}^T\Sigma^{-1}{\bf x} + {\bf x}^T\Sigma^{-1}{\pmb \mu}+const \qquad{(2.71)}$$

This is called completing the square

  1. How left equation derived to the right equation?
  2. what does equation represent? (do not understand the format of the euations and why we use this?)

  3. what does it mean to find the mean and the variance from this equation?

and this is conditional probability for gaussian

and this equation $$-\dfrac{1}{2}{\bf x}_a^T\Lambda_{aa}{\bf x}_a \qquad{(2.72)}$$

it said take derivative two times and this equations derived. ..... why....

$$\Sigma\_{a|b} = \Lambda_{aa}^{-1} \qquad{(2.73)}$$

why some covariance represent the conditional probability?

if its too hard to explain can you list the "things" I should know for this equations?

  • 3
    If you don't understand the material in Appendix C, Properties of Matrices or cannot independently verify all the results there, then consult the references Bishop gives at the beginning of that Appendix. – whuber Aug 23 '16 at 17:55
  • 1
    in 1. you ask how left expression is derived from the right expression. $-\frac{1}{2}(X- \mu)^T \Sigma^{-1}(X-\mu) = -\frac{1}{2} \left(X^T\Sigma^{-1}X + X^T\Sigma^{-1}\mu + \mu^T\Sigma^{-1}X + \mu^T\Sigma^{-1}\mu \right) $ use $X^T\Sigma^{-1}\mu = \mu^T\Sigma^{-1}X$ and note that $\mu^T\Sigma^{-1}\mu$ is constant in $X$. – them Aug 25 '16 at 14:07
  • 1
    @them When distributing the $-1/2$ the second term should become $-1$ in the $(2.71)$ expression in the OP. Is it a typo? – Antoni Parellada Aug 25 '16 at 15:13
  • @AntoniParellada, you are right it's my mistake, I should have written: $-\frac{1}{2}\left(X^T\Sigma^{-1}X - \mu^T\Sigma^{-1}X - X^T \Sigma^{-1} \mu + \mu^T\Sigma^{-1}\mu \right)$. – them Aug 25 '16 at 15:29
  • @them Why $X^T\Sigma^{-1}\mu = \mu^T\Sigma^{-1}X$ ? – Kamel May 16 '18 at 14:38
  • @Kamel They are transpose of eachother, but $x^\top\Sigma^{-1}\mu$ is a scalar so its transpose is still the same scalar. – them May 16 '18 at 15:35

1 Answers1

0

To allow flexibility in the algebra, and circumvent the fact that

$$\small \begin{bmatrix} \Sigma_{aa} & \Sigma_{ab}\\ \Sigma_{ba} & \Sigma_{bb} \end{bmatrix}^{-1} \neq \begin{bmatrix} \Sigma_{aa}^{-1} & \Sigma_{ab}^{-1}\\ \Sigma_{ba}^{-1} & \Sigma_{bb}^{-1} \end{bmatrix} $$

We replace the $\Sigma^{-1}$ by the precision matrix:

\begin{align}\boldsymbol \Sigma^{-1} = \begin{bmatrix} \Sigma_{aa} & \Sigma_{ab}\\ \Sigma_{ba} & \Sigma_{bb} \end{bmatrix}^{-1} = \begin{bmatrix} \Lambda_{aa} & \Lambda_{ab}\\ \Lambda_{ba} & \Lambda_{bb} \end{bmatrix} \end{align}


Now we can expand the quadratic exponent of the joint pdf of the partitioned multivariate Gaussian $\bf x =\begin{bmatrix}{\bf x_a}&{\bf x_b}\end{bmatrix}^T$:

$$-\frac{1}{2}({\bf x} - {\boldsymbol \mu})^T\, \Sigma^{-1} \, ({\bf x} - \boldsymbol \mu)\\ = -\frac{1}{2} \begin{bmatrix} {\bf x}_a - \boldsymbol\mu_a & {\bf x_b} - \boldsymbol\mu_b \end{bmatrix} \begin{bmatrix} \Lambda_{aa} & \Lambda_{ab}\\ \Lambda_{ba} & \Lambda_{bb} \end{bmatrix} \begin{bmatrix} {\bf x}_a - \boldsymbol\mu_a \\ {\bf x}_b - \boldsymbol \mu_b \end{bmatrix} \small\\ = -\frac{1}{2}\left[\small ({\bf x}_a-\boldsymbol\mu_a)^T\, \Lambda_{aa}({\bf x}_a-\boldsymbol\mu_a) + 2 \,({\bf x}_a-\boldsymbol\mu_a)^T\, \Lambda_{ab}({\bf x}_b-\boldsymbol\mu_b)+({\bf x}_b-\boldsymbol\mu_b)^T\, \Lambda_{bb}({\bf x}_b-\boldsymbol\mu_b)\right] \\= \color{blue}{-\frac{1}{2}\small ({\bf x}_a-\boldsymbol\mu_a)^T\, \Lambda_{aa}({\bf x}_a-\boldsymbol\mu_a)} - \,({\bf x}_a-\boldsymbol\mu_a)^T\, \Lambda_{ab}({\bf x}_b-\boldsymbol\mu_b) -\frac{1}{2} ({\bf x}_b-\boldsymbol\mu_b)^T \Lambda_{bb}({\bf x}_b-\boldsymbol\mu_b) $$

to prove that we end up with quadratic exponents. Hence the joint distribution will be Gaussian. The mean and variance will fully characterize the distribution. The color indicates the only quadratic ${\bf x}_a$ form (see below).


Finding the mean and variance "completing the square" is the step used in here. However, in the book it seems as though the operation was simply to expand the quadratic form, and then convert the resulting expression into an $ax^2+bx+c$ polynomial form as commented by @them:

$$\small -\frac{1}{2}({\bf x}- \boldsymbol\mu)^T \Sigma^{-1}({\bf x}-\boldsymbol\mu) =-\frac{1}{2}\left({\bf x}^T\Sigma^{-1}{\bf x} - \boldsymbol\mu^T\Sigma^{-1}{\bf x} - X^T \Sigma^{-1} \boldsymbol\mu + \boldsymbol\mu^T\Sigma^{-1}\boldsymbol\mu \right)$$

and noting that ${\bf x}^T\Sigma^{-1}\boldsymbol\mu = \boldsymbol\mu^T\Sigma^{-1}{\bf x}$

$$\small -\frac{1}{2}({\bf x}- \boldsymbol\mu)^T \Sigma^{-1}({\bf x}-\boldsymbol\mu) =-\frac{1}{2}{\bf x}^T\Sigma^{-1}{\bf x} + {\bf x}^T \Sigma^{-1} \boldsymbol\mu -\frac{1}{2} \boldsymbol\mu^T\Sigma^{-1}\boldsymbol\mu $$

and since $-\frac{1}{2} \boldsymbol\mu^T\Sigma^{-1}\boldsymbol\mu$ does not depend on ${\bf x}$ we can just turn it into a constant $C$:

$$\begin{eqnarray} \small -\frac{1}{2}({\bf x}- \boldsymbol\mu)^T \Sigma^{-1}({\bf x}-\boldsymbol\mu) =\color{brown}{-\frac{1}{2}{\bf x}^T\Sigma^{-1}{\bf x}} + {\bf x}^T \Sigma^{-1} \boldsymbol\mu +C \qquad{(2.71)}\\ =-\frac{1}{2}\begin{bmatrix} {\bf x}_a & {\bf x}_b\end{bmatrix}^T\Lambda \begin{bmatrix} {\bf x}_a \\ {\bf x}_b\end{bmatrix} + {\bf x}^T \Sigma^{-1} \boldsymbol\mu +C\\ =\color{blue}{-\frac{1}{2}}\begin{bmatrix}\color{blue}{ {\bf x}_a} & X_b\end{bmatrix}^T \begin{bmatrix} \color{blue}{\Lambda_{aa}} & \Lambda_{ab}\\ \Lambda_{ba} & \Lambda_{bb} \end{bmatrix} \begin{bmatrix} \color{blue}{{\bf x}_a} \\ {\bf x}_b\end{bmatrix} + {\bf x}^T \Sigma^{-1} \boldsymbol\mu + C \end{eqnarray}$$

When conditioning on $X_b$ (acting now as a constant), the quadratic form in the exponent $\small -\frac{1}{2}(X- \boldsymbol\mu)^T \Sigma^{-1}(X-\boldsymbol\mu) $, of which $\Sigma^{-1}$ is the variance, will given by the elements colored in blue (compare to the part in red two lines prior). This explains mention of

$$\color{blue}{-\dfrac{1}{2}{\bf x}_a^T\Lambda_{aa}{\bf x}_a }$$

in the book and in the OP. In this expression $\boldsymbol \mu$ has been assimilated into $C$; otherwise we have again the blue-colored expression in the second part of the answer.

Hence the variance of $f({\bf x}_a\vert {\bf x}_b)$ will be:

$$\Sigma_{a\vert b} = \Lambda_{aa}^{-1} $$

At this point the book moves on to the mean.


This link provides the pertinent three pages in Pattern Recognition and Machine Learning by Christopher Bishop.

And here is a link to very pertinent material on completing the square as a technique to derive the marginal and conditional pdf of a multivariate Gaussian.