2

I have trouble deriving the EM algorithm for the Probabilistic latent semantic analysis (pLSA) model via Lagrange multipliers.

I model the missing data $Q_{zij} \in \{0,1\}$ for word $w_j$ in document $d_i$, which gives rise to the variational distribution over $z: q_{zij} = P(Q_{zij} = 1), \sum_z q_{zij} = 1, q_{zij} \geq 0$. Then I derive a lower bound via Jensen's inequality and arrive at the optimisation of the log likelihood over $q$ for a fixed $u_{zi}, v_{zj}$ via Lagrange multiplier:

$\cal{L}(q, \lambda) = \sum_{z=1}^K q_{zij}[\log u_{zi} + \log v_{zj} - \log q_{zij}] + \lambda(\sum_{z=1}^K q_{zij} - 1)$

Applying the first order optimality condition, which is taking the partial derivatives with respect to $q_{zij}$ I get:

$\lambda + (\log u_{zi} + \log v_{zj} - \log q_{zij} -1) = 0$

This now leaves me with $K + 1$ equations for $K+1$ unknowns, which are $\lambda$ and the $K$ $q_{zij}$ values. However, I don't know how to actually solve this. I know that the solution should be

$q_{zij} = \frac{v_{zi}u_{zj}}{\sum_{p=1}^K v_{pi}u_{pj}}$ which is just the posterior of $Q_{zij}$ if I expand $v$ and $u$ to their respective pdfs.

How do I solve this to properly derive the E step?

lyinch
  • 201

1 Answers1

0

I found the solution. For brevity, I'll drop the indices $i,j$. First, we isolate $q_z$, then we calculate $\lambda$ and once we have it, we can plug $\lambda$ back into the first equation:

The first step is to isolate $q_z$: $\lambda + \log(u_z v_z) - \log q_z -1 = 0 \iff q_z = \exp(\lambda + \log(u_z v_z) -1 ) = \exp(\lambda -1) u_z v_z $

Now we use the second condition: $\sum_z q_z -1 = 0$, plug $q_z$ in, and isolate $\lambda$:

$\sum_z \exp(\lambda -1) u_z v_z -1 = 0 \iff \exp(\lambda -1) = \frac{1}{\sum_z u_z v_z} \iff \lambda = \log \frac{1}{\sum_z u_z v_z} + 1 $

Now we use this $\lambda$ and plug it back into the first equation where we isolated $q_z$:

$ q_z = \exp(\log \frac{1}{\sum_p u_p v_p} + 1 -1) u_z v_z = \frac{u_z v_z}{\sum_p u_p v_p}$

And that's the solution! (note that I changed the index of the sum to range over $p$ to not conflict with the $z$)

lyinch
  • 201