In the E-step of the EM algorithm we maximize $$\max_\theta \sum_Z p(Z\mid X,\theta_\text{old})\log p(X,Z\mid\theta).$$ This expression is called the expectation of the complete data log-likelihood $\log p(X,Z\mid\theta)$. I do not see any expectation, which is defined as $E(Y)=\sum_YYp(Y)$. Why is it called this way? How can I see it is an expectation?
Asked
Active
Viewed 785 times
2
Michael Hardy
- 9,818
tomka
- 6,572
-
1I'd write $\displaystyle \operatorname{E}(Y) = \sum_y y p(y),$ being careful about which $Y\text{s}$ are capital and which $y\text{s}$ are lower-case. – Michael Hardy Apr 28 '17 at 22:39
1 Answers
7
You are combining both steps. Breaking them out (e.g. see here), you have
E step
$Q(\theta\mid\theta_\text{old})=\sum_Z p(Z\mid X,\theta_\text{old})\log p(X,Z|\theta)$
M step
$\theta_\text{new}=\max_\theta Q(\theta\mid\theta_\text{old})$
For the "E step", you are computing the average $\mathbb{E}\big[\log p(X,Z\mid\theta)\big]$, taking $Z\sim p(Z\mid X,\theta_\text{old})$.
Michael Hardy
- 9,818
GeoMatt22
- 12,950
-
So this is the expectation $\mathbb{E}Z$, i.e. with respect to $Z$ only? I appears that otherwise we would need to weight by $p(Z,X|\theta{old})$. – tomka Apr 28 '17 at 19:20
-
Yes, $X$ is the observed data which does not change. The average is over possible values of the hidden data $Z$. – GeoMatt22 Apr 28 '17 at 19:28
-
But that means that the "E-step" is not a step at all, in the usual meaning of this word, doesn't it? – Elmar Zander Dec 14 '18 at 14:10