Why does the EM algorithm have to be iterative?

Question

Suppose that you have a population with $N$ units, each with a random variable $X_i \sim \text{Poisson}(\lambda)$. You observe $n = N-n_0$ values for any unit for which $X_i > 0$. We want an estimate of $\lambda$.

There are method of moments and conditional maximum likelihood ways of getting the answer, but I wanted to try the EM algorithm. I get the EM algorithm to be $$ Q\left(\lambda_{-1}, \lambda\right) = \lambda \left(n + \frac{n}{\text{exp}(\lambda_{-1}) - 1}\right) + \log(\lambda)\sum_{i=1}^n{x_i} + K, $$ where the $-1$ subscript indicates the value from the previous iteration of the algorithm and $K$ is constant with respect to the parameters. (I actually think that the $n$ in the fraction in parentheses should be $n+1$, but that doesn't seem accurate; a question for another time).

To make this concrete, suppose that $n=10$, $\sum{x_i} = 20$. Of course, $N$ and $n_0$ are unobserved and $\lambda$ is to be estimated.

When I iterate the following function, plugging in the previous iteration's maximum value, I reach the correct answer (verified by CML, MOM, and a simple simulation):

EmFunc <- function(lambda, lambda0){
  -lambda * (10 + 10 / (exp(lambda0) - 1)) + 20 * log(lambda)
}

lambda0 <- 2
lambda  <- 1

while(abs(lambda - lambda0) > 0.0001){
  lambda0 <- lambda
  iter    <- optimize(EmFunc, lambda0 = lambda0, c(0,4), maximum = TRUE)
  lambda  <- iter$maximum
}

> iter
$maximum
[1] 1.593573

$objective
[1] -10.68045

But this is a simple problem; let's just maximize without iterating:

MaxFunc <- function(lambda){
  -lambda * (10 + 10 / (exp(lambda) - 1)) + 20 * log(lambda)
}

optimize(MaxFunc, c(0,4), maximum = TRUE)
$maximum
[1] 2.393027

$objective
[1] -8.884968

The value of the function is higher than in the un-iterative procedure and the result is inconsistent with the other methodologies. Why is the second procedure giving a different and (I presume) incorrect answer?

jayk · Answer 1 · 2014-07-16T03:24:07.123

When you've found your objective function for the EM algorithm I assume you treated the number of units with $x_i=0$, which I'll call $y$, as your latent parameter. In this case, I'm (again) assuming $Q$ represents a reduced form of the expected value over $y$ of the likelihood given $\lambda_{-1}$. This is not the same as the full likelihood, because that $\lambda_{-1}$ is treadted as given.

Therefore you cannot use $Q$ for the full likelihood, as it this does not contain information about how changing $\lambda$ changes the distribution of $y$ (and you want to select the most likely values of $y$ as well when you maximize the full likelihood). This is why the full maximum likelihood for the zero truncated Poisson differs from your $Q$ function, and why you get a different (and incorrect) answer when you maximize $f(\lambda)=Q(\lambda,\lambda)$.

Numerically, maximizing $f(\lambda)$ will necessarily result in an objective function at least as large as your EM result, and probably larger as there is no guarantee that the EM algorithm will converge to a maximum of $f$ - it's only supposed to converge to a maximum of the likelihood function!

Why does the EM algorithm have to be iterative?

1 Answers1