2

Consider the following density, the mixture of two Gaussian distributions, \begin{align*} p(x)= p(k=1) N(x|\mu_1,\sigma^2_1) + p(k=0) N(x|\mu_0,\sigma^2_0) , \end{align*} where $p(k=1)+p(k=0)=\pi_1+\pi_0=1$ and $N(x|\mu,\sigma^2)$ is the density of Gaussian distribution with mean $\mu$ and variance $\sigma^2$. Parameters of interests are $\pi_0$, $\mu_i$'s and $\sigma^2_i$'s.

This Q & A shows the MLE for the mixture of two Gaussian distributions when the latent variables $K_i$'s are observed. In this question, suppose we only observe $X_i$'s, and the latent variables $K_i$'s are unobserved. Classical methods for estimation of these $5$ unknown parameters are EM-algorithm and MCMC sampling, see Hastie et. al. (2009) for details.

Why cannot MLE be implemented for Gaussian mixture model directly?


(Some attempt)

The log-likelihood would be \begin{align*} \ln P(x|\theta) = \sum_{i=1}^n \bigg[ (1-k_i) (\ln \pi_0 + \ln N(x_i|\mu_0,\sigma_0^2))+k_i(\ln \pi_1 + \ln N(x_i|\mu_1,\sigma_1^2)) \bigg]. \end{align*}

Huihang
  • 115
  • 4
    There is no closed form solution for the MLE. This means that the loglikelihood needs to be maximised numerically. This is what EM in fact does. One can use alternative numerical approaches, but they are not necessarily better, and don't necessarily deserve the description "directly" either.. – Christian Hennig Mar 25 '22 at 15:48
  • 1
    For simple problems, maximizing the full log-likelihood you give here numerically works fine. (So the short answer to "Why cannot ..." is "It can.") – Ben Bolker Mar 25 '22 at 16:16
  • 1
    The real problem is that the likelihood blows up whenever one of the $\mu_i$ coincides with a data value and the corresponding $\sigma_i$ shrinks to zero. This gives multiple global maxima, all of which are usually considered unrealistic. – whuber Mar 25 '22 at 16:48
  • I agree with Ben on this. It is not the end of the world. Especially with a reasonably large sample and some regularisation, it is doable. – usεr11852 Mar 26 '22 at 01:21

1 Answers1

4

As @ChristianHenning has pointed out in the comments, the mixing of components makes likelihood analytically intractable. On the other hand, Bayesian methods are not applicable due to the combinatorial explosion of the terms in the expanded likelihood. See, e.g., the discussion and references in these notes. Furthermore, the multimodal nature of the likelihood makes it difficult to use direct numerical maximization or some simpler Monte Carlo algorithms, such as Gibbs sampling or Metropolis-Hastings.

Roger V.
  • 3,903