Questions tagged [expectation-maximization]

An optimization algorithm often used for maximum-likelihood estimation in the presence of missing data.

The expectation maximization (EM) algorithm is an optimization algorithm. It is an instance of a broader class of optimization algorithms known as Majorization-Minimization (MM). It is a cornerstone of statistics since it particularly suitable for "skipping" local maxima which often arise in likelihood maximization problems, especially in the presence of missing data.

More specifically, the EM algorithm is an iterative method for finding maximum likelihood estimates. The typical form of the EM algorithm works as follows:

  • Expectation Step: Compute the expected value of the log-likelihood function based on the current estimate.
  • Maximization Step: Update the estimate by maximizing the result of the expectation step.
603 questions
29
votes
3 answers

Why is the expectation maximization algorithm used?

From what little I know the EM algorithm can be used to find the maximum likelihood when setting to zero the partial derivatives with respect to the parameters of the likelihood gives a set of equations that cannot be solved analytically. But is the…
user782220
  • 1,062
9
votes
4 answers

Termination Condition(s) for Expectation Maximization

What are good criteria for deciding when to terminate the expectation-maximization algorithm? I know that the idea is that you should terminate when the change in the data log-likelihood is "small" or the change in the model parameters is "small"…
Dave
  • 3,259
9
votes
1 answer

Why does the EM algorithm have to be iterative?

Suppose that you have a population with $N$ units, each with a random variable $X_i \sim \text{Poisson}(\lambda)$. You observe $n = N-n_0$ values for any unit for which $X_i > 0$. We want an estimate of $\lambda$. There are method of moments and…
Charlie
  • 14,062
  • 5
  • 44
  • 72
5
votes
2 answers

How to initialize EM-algorithm when trying to fit data to a separable mixture model?

As a part of my studies, I’m trying to cluster co-occurrences of URLs and tags in data from Delicious. I found a promising method for this in a paper called “Emergent Semantics from Folksonomies: A Quantitative Study” (pages 6-13). It used a…
DJ Pirtu
  • 153
  • 1
  • 6
5
votes
1 answer

Examples of "Hard" Expectation Maximization other than clustering?

Are there examples of learning algorithms(other than k-means clustering) which fit the paradigm of Hard-EM? By hard EM, I mean the variant described in here.
shyamupa
  • 393
5
votes
1 answer

Explanation of the EM algorithm

I've seen this paper's straightforward description of the EM algorithm cited countless times now to explain EM (figure below). But it's only causing me more confusion because I have trouble seeing how it aligns with EM theory. In theory, the M step…
ezhao15
  • 153
5
votes
2 answers

Initializing EM algorithm

I am using EM algorithm to estimate measured data ($y$) as a sum of two weighted gaussian distributions: $$model = \sum \limits_{i=1}^{L=2} w_i \phi(\theta_i)$$ Where $\theta$ = ($\mu$, $\sigma$). The problem is that the algorithm does not converge…
xDazz
  • 51
4
votes
1 answer

Why isn't my EM algorithm increasing the log-likelihood after each iteration?

I used EM algorithm to estimate the parameters of the following time invariant process. $$y=Ct+e_y$$ $$x=Pt+e_x$$ where $y \in R$, $x \in R^d$, $t\sim N(0,I^{p \times p})$, $e_y \sim N(0,\Sigma_y)$, and $e_x \sim N(0,\Sigma_x)$. Both $\Sigma_x$ and…
4
votes
1 answer

Details in proof for convergence of Expectation Maximization Algorithm

I am going through the paper provided here http://www.cs.cmu.edu/~dgovinda/pdf/recog/EM_algorithm-1.pdf I could not make out how the following was derived $\sum_z \mathcal P(\mathbf z|X, \theta_n) \ln \big( \frac{\mathcal P(X| \mathbf z,…
Curious
  • 592
3
votes
2 answers

Why is $q(\mathbf{z})$ chosen to be the posterior distribution in the EM algorithm?

In the CS229 Lecture Notes on the EM algorithm by Tengyu Ma and Andrew Ng (2019), the authors write that $$ \log(p(\mathbf{x};\theta)) = \log\left(\mathbb{E}_{q(\mathbf{z})}\left[\frac{p(\mathbf{x},\mathbf{z};\theta)}{q(\mathbf{z})}\right]\right)…
mhdadk
  • 4,940
3
votes
3 answers

Is it possible to see the slow decreasing in test negative-log-likelihood as overfitting?

We have developed a model for some real data and we use EM algorithm for optimization of the model (parameters). In first phase we generate synthetic data according to the model (with some known parameters) to validate that our inference code is bug…
3
votes
1 answer

Does EM algorithm increase the lower bound as well as true likelihood

I am using a variational bayes method (without a M step since no parameters) to infer my model. My question is, if it is working correctly will it increase the log likelihood of the data, $\log(p(y))$. If so does it guarantee that the lower…
sachinruk
  • 1,353
  • 1
  • 11
  • 24
2
votes
1 answer

E-step of the stochastic approximation EM

I am reading the paper: Convergence of a stochastic approximation version of the EM algorithm to implement this algorithm for a probability model I already have. In p. 3, the paper summarises the algorithm as follows. I am stuck at the E (or S)…
user16776
2
votes
0 answers

Non-increasing Log Likelihood with Expectation Maximization

I am struggling with my implementation of the expectation maximization (EM) algorithm for a certain model. The sequence of log likelihood values is not increasing, which is contradicting the theory. The measured outcome variable y is binary and is…
Thomas
  • 51
  • 4
2
votes
1 answer

derivation of E step in EM algorithm for pLSA via Lagrangian

I have trouble deriving the EM algorithm for the Probabilistic latent semantic analysis (pLSA) model via Lagrange multipliers. I model the missing data $Q_{zij} \in \{0,1\}$ for word $w_j$ in document $d_i$, which gives rise to the variational…
lyinch
  • 201
1
2 3