7

In my self-study, I consider a Gaussian mixture distribution:

$$p(x)= p(k=1) N(x|\mu_1,\sigma^2_1) + p(k=0) N(x|\mu_0,\sigma^2_0)$$

where $p(k=1)+p(k=0)=\pi_1+\pi_0=1$. I am now asked to do three things:

  1. Write down the likelihood of the observations as a product over $n$ observations

  2. Write down the likelihood as a product over the likelihoods for ${K_0}$ and $K_1$, where $K$ is the set of indices for $k=1$ and $k=0$, respectively.

  3. Compute the log-likelihood and maximize for $\mu_0$ and $\sigma_0$.

I am not really sure what I am asked to do. I believe the likelihood is given by:

$$p(x|\pi_0, \pi_1, \mu_0, \mu_1, \sigma_0^2, \sigma_1^2) = \prod_{i=1}^n \bigg[ \pi_1 N(x_i|\mu_1,\sigma^2_1) + \pi_0 N(x_i|\mu_0,\sigma^2_0) \bigg]$$

So the log-likelihood is the sum of the logarithm of the sum in the parentheses. This seems correct. However no closed form solution exists of the derivative nor maximizer.

First, I am not sure which of no. 1 or 2 my likelihood expression solves. I think no. 1 but then, second, I am not sure what I am asked to do in no. 2. I suppose the solution to 2 is easier to maximize in 3. Third, it seems there are two different expressions for the likelihood then, but shouldn't there be just one?

Note: I was thinking for no. 2 along the lines of

$$p(x|\pi_0, \pi_1, \mu_0, \mu_1, \sigma_0^2, \sigma_1^2) = \prod_{i=1}^n \bigg[(\pi_1N(x_i|\mu_1,\sigma^2_1))^{k_i} (\pi_0N(x_i|\mu_0,\sigma^2_0))^{1-k_i} \bigg]$$

but got stuck.

Xi'an
  • 105,342
tomka
  • 6,572

2 Answers2

4

This is the proper start but I wonder at the wording of the exercise. I would have asked the following:

  1. Write the likelihood of the sample $(x_1,\ldots,x_n)$ when the $X_i$'s are iid from$$p(x)= \mathbb{P}(K=1) N(x|\mu_1,\sigma^2_1) + \mathbb{P}(K=0) N(x|\mu_0,\sigma^2_0)\qquad\qquad(1)$$and conclude at the lack of closed-form expression for the maximum likelihood estimator.
  2. Introducing the latent variables $K_i$ associated with the component of each $X_i$, namely$$\mathbb{P}(K_i=1)=\pi_1=1-\mathbb{P}(K=0)$$and$$X_i|K_i=k\sim N(x|\mu_k,\sigma^2_k)$$show that the marginal distribution of $X_i$ is indeed (1).
  3. Give the density of the pair $(X_i,K_i)$ and deduce the density of the completed sample $((x_1,k_x),\ldots,(x_n,k_n))$, acting as if the $k_i$'s were also observed. We will call this density the completed likelihood.
  4. Derive the maximum likelihood estimator of the parameter $(\pi_0,\mu_0,\mu_1,\sigma_0,\sigma_1)$ based on the completed sample $((x_1,k_x),\ldots,(x_n,k_n))$.
Xi'an
  • 105,342
  • Your (justified) confusion emerged because the exercise does not state explicitly that $k_i$ are observed. This is implicit in the formulation in no. 2 about the sets. In particular the exercise says here "we know that $n \in {K_0}$ are the indices for the disease free patients and $n \in K_1$ are the indices for the patients with the disease (i.e. $K_0$ and $K_1$ are non-intersecting sets of indices from 1 to N)". I believe this is supposed to tell me that $k$ is observed, but I am still not sure. – tomka Sep 19 '16 at 20:07
  • 2
    My confusion is about the wording, not the problem: This is rather standard stuff about mixtures, leading towards the EM algorithm. For instance, this is how we proceed in our book (Chapter 5). In a mixture model, the $K_i$'s are not observed, otherwise it would not be a mixture but an aggregate of two normal samples. – Xi'an Sep 19 '16 at 20:09
  • Please see my suggested solution below. Comments welcome. – tomka Sep 19 '16 at 20:24
4

I continued working on this exercise and came up with a solution. I'd be glad about comments.

Let $\theta=[\pi_0,\pi_1,\mu_0,\mu_1,\sigma_0^2,\sigma_1^2]$

  1. The likelihood over N observations is given by: $$ P(x|\theta) = \prod_{i=1}^n \bigg[\pi_0 N(x_i|\mu_0,\sigma_0^2)+\pi_1 N(x_i|\mu_1,\sigma_1^2) \bigg]$$

  2. The likelihood written as product over sets $K_0$ and $K_1$ is given by

$$ P(x|\theta) = \prod_{i=1}^n \bigg[ (\pi_0 N(x_i|\mu_0,\sigma_0^2))^{1-k_i}(\pi_1 N(x_i|\mu_1,\sigma_1^2))^{k_i} \bigg]$$

  1. The log-likelihood is given by

$$ \ln P(x|\theta) = \sum_{i=1}^n \bigg[ (1-k_i) (\ln \pi_0 + \ln N(x_i|\mu_0,\sigma_0^2))+k_i(\ln \pi_1 + \ln N(x_i|\mu_1,\sigma_1^2)) \bigg] $$

Consequently we can find the MLE for $\mu_0$ and $\sigma^2_0$ in a nearly standard way finding:

$$\hat{\mu}_0 = \frac{1}{\sum_{i_1}^n (1-k_i)} \sum_{i_1}^n (1-k_i) x_i$$

$$\hat{\sigma}^2_0 = \frac{1}{\sum_{i_1}^n (1-k_i)} \sum_{i_1}^n (1-k_i)(x_i - \hat{\mu}_0)^2$$

tomka
  • 6,572
  • 3
    The answer is correct and constitutes the penultimate step to the derivation of the EM algorithm. – Xi'an Sep 19 '16 at 20:32
  • Out of curiosity, why is the likelihood here taken to be simply the multiplication of the individual Gaussian functions? I can understand the probability density function is given by a Gaussian, and if these variables are normally distributed and independent, then the joint density function, i.e. $p(x_1,x_2)$, is equal to $p(x_1)p(x_2)$. But why is the likelihood equal to the joint density function here? Shouldn't it be multiplied by a prior (which I suppose may simply be uniform here) to be equivalent? – Mathews24 Apr 25 '19 at 02:18