Let us assume we need to maximize the likelihood $p(x|\theta)$ of an observed variable $x$ given a set of parameters $\theta$. However, this likelihood also depends on a set of hidden variables $z$, that we cannot observe. Updating the parameters $\theta$ in an optimal way, would require the knowledge of $z$, that is not possible. However, it is still possible to estimate it, and use this estimate to further update the parameters $\theta$. If I am not mistaken, this is the Expectation-Maximization (EM) algorithm.
This algorithm is divided in two different steps: the E and M steps. When the overall problem is intractable, the E step can be performed by minimizing an upper bound of the expected value of $z$ via gradient descent (the variational free energy). When this process has converged, the parameters $\theta$ of the model are optimized in the M step. I hope my understanding so far is clear; if not, what is wrong?
Let us assume we perform the E step via gradient descent: in this case, the process usually runs until GD has converged, and only here the M step is performed. In a practical case I am working on, I have realised that performing the M step while gradient descent is minimizing my free energy for the $E$ step, leads to better results in practice. In other words, instead of waiting for the E step to be completed, I simultaneously update both the expected value of $z$ and the parameters $\theta$ according to the current estimation, until the whole process converges.
I don't think I am the first person to try such a procedure, hence, I was wondering whether there exist similar methods In the literature, or simply articles that explain the advantages of such an algorithm over the discrete EM one, where the M step is only performed once after the E step has converged.