1

According to Bishop, the author from "Statistical Pattern Recognition", we can optimize the hyperparameters of a Gaussian process by maximizing the likelihood function

$$p(\textbf{t}|\theta),$$ where $\textbf{t}$ denotes the target vectors $(t_1, .. ,t_N)$ of the corresponding input values $x_1, ..., x_N$ and $\theta$ the hyperparameters.

He then claims, that the log likelihood function is given by the standard form for a multivariate Gaussian distribution:

$$\ln{p(\textbf{t}|\theta}) = -\frac{1}{2}\ln{|C_N|}-\frac{1}{2} \textbf{t}^T C_N^{-1}\textbf{t}-\frac{N}{2}\ln{2\pi}$$

Considering the multivariate Gaussian distribution for $X = [X_1, ...., X_N]^T$ is given by: (https://cs229.stanford.edu/section/gaussians.pdf)

$$p(x|\mu,\Sigma) = \frac{1}{(2\pi)^{\frac{n}{2}}|\Sigma|^{\frac{1}{2}} }\exp\left(-\frac{1}{2}(x-\mu)\Sigma^{-1}(x-\mu)\right)$$

To me, the first equation is derived by simply the $\log$ of the second equation (keep in mind that the mean is zero in the Gaussian Process described by Bishop) without taking the product into account, which would be $\prod_N{\ln{p(\textbf{t}|\theta})}$. I am not sure, what I am missing here. As far as I know, taking the log of a gaussian distribution is not enough to maximize a wanted parameter.

To be more precise, would the MLE of equation 2 not be:

$$\prod_K{p(x|\mu,\Sigma) = \frac{1}{(2\pi)^{\frac{kn}{2}}|\Sigma|^{\frac{k}{2}} }\exp\left(-\frac{1}{2}\sum_K(x-\mu)\Sigma^{-1}(x-\mu)\right)}$$

which is not equal to the first equation, after taking the log. Notice the $k$ in the denumerator and the sum of the term in the exp term. Thus, I would expect something like:

$$\ln{p(\textbf{t}|\theta}) = -\frac{K}{2}\ln{|C_N|}-\frac{1}{2} \sum \textbf{t}^T C_N^{-1}\textbf{t}-\frac{KN}{2}\ln{2\pi}$$

This assumption is further verified by the accepted answer in this post: Maximum Likelihood Estimators - Multivariate Gaussian

For me, the first equation is just the log of a gaussian multivariate normal distribution with zero mean, not the MLE.

MarianD
  • 1,535
  • 2
  • 11
  • 18
kklaw
  • 515

1 Answers1

1

$$\log_k\left(\prod x_i\right)=\sum\log_k\left(x_i\right)$$

$$\text{AND}$$

$$ \log_k\left(k^x\right)=x $$

$$\implies$$

$$ \log_k\left(\prod k^{x_i}\right) =\sum x_i $$

You’re taking the base-$e$ logarithm of a product of $e^x$ expressions, turning the base-$e$ logarithm of the product into the sum of the exponents.

Dave
  • 62,186