I am struggling with my implementation of the expectation maximization (EM) algorithm for a certain model. The sequence of log likelihood values is not increasing, which is contradicting the theory.
The measured outcome variable y is binary and is generated by one of two distributions:
- p(y=1|z=0) = xi, a fixed probability
- p(y=1|z=1,x) = 1/(1+exp(theta'*x))
So either there is a fixed probability of obtaining a 1, or the probability depends on the instance's characteristics x, and is modeled with a logistic regression. You can find a more detailed derivation (with the actual LL and the E and M step) in the following document:
https://dl.dropbox.com/u/46839575/MMapproach.pdf
The setting is customer churn prediction, but I guess no background knowledge is needed to check the modeling approach.
I have implemented this in Matlab, and the problem is that the LL does not increase monotonically. I have checked my code several times, and I do not find any mistakes until now, so I am starting to wonder whether my derivation is correct. Maybe someone of you could point to any incorrect assumptions?
In particular, I started to doubt whether the assumption that my latent class membership is modeled with the person's characteristics is correct (in most (Gaussian) mixture models this is a fixed probability, not varying over different persons). Furthermore, the same characteristics are used to model latent class membership and churn probability (although my model also gives non increasing log likelihood values when the set of characteristics in both models is mutually exclusive).
I hope my explanation is clear.
thank you very much for your comment. It seems I am indeed looking at the complete log-likelihood, not at the incomplete LL. Am I correct that in my case, the incomplete log-likelihood would look as follows:
log(L_incomplete) = log(L_complete) - log(mu_i)
with log(L_complete) = equation (7) and mu_i = as defined on page 3 of my document (sorry for referencing the document again, but I don't have enough characters here).
Also, how to interpret the complete and incomplete LL? Which one is more relevant?
Thanks so much for your help!
– Thomas Feb 05 '13 at 06:34