I am using EM algorithm to estimate measured data ($y$) as a sum of two weighted gaussian distributions:
$$model = \sum \limits_{i=1}^{L=2} w_i \phi(\theta_i)$$
Where $\theta$ = ($\mu$, $\sigma$). The problem is that the algorithm does not converge properly if the initialization is quite dissimilar to the real values (I generate the observed data as a sum of two gaussians + noise). So my question is about a good approach to estimate the initial values before running EM.
For the initialization, I tried to use a model (empty at the beginning) and then in a loop I compute this:
Initialization:
$$r = y - model$$ (r=residual)
$$\hat\theta_i = argmax_\theta |r^T\phi(\theta_i)|^2$$ (with newton optimization)
$$\hat w_i = \frac{\phi(\hat\theta_i) r}{\phi(\hat\theta_i)^T\phi(\hat\theta_i)}$$
$$model = model + \hat w_i \phi(\hat\theta_i)$$
But I am not getting good initial values, they are quite dissimilar to the real ones, so my EM algorithm doesn't converge.
Edit 1: If I try to find my initial values as I mentioned before, by computing the residual and the correlation with the gaussian functions to find the estimated values of $\mu$ and $\sigma$ and then the weights, in the first iteration I find a correct value of $\mu$ but the value of $\sigma$ is really small, so the factor $\frac{1}{\sigma \sqrt{2\pi}}$ is huge and so the correlation. That leads to wrong values for the weights and the rest of parameters for the next iterations. This is the point where I am stuck.
