Effect of redundant training data in HMM-based speech recognizer/synthesizer?

Question

How are redundant training data handled during the training stage?

For example, assume we have one observation for phone $\theta$ in the training set.

Then the training (for a monophone) is done with:

$$\lambda_{max}^\theta = \text{arg} \max_\lambda p(\bf{O}|\lambda) $$

where $\bf{O}$ is the observation sequence and $\lambda$ is the HMM model.

This is straightforward when only a single observation is available from the training set. But what happens when there are multiple observations for phone $\theta$ in the training set? How is $\bf{O}$ adjusted?

score 1 · Answer 1 · edited Apr 13 '17 at 12:47

1

I guess you confuse GMM and HMM trainings. Although in both cases EM algorithm is employed, Baum-Welch is used for HMM training.

edited Apr 13 '17 at 12:47

Community

1

answered Dec 28 '15 at 17:18

Alexander Solovets

166
4

I'm referring to the case when there are multiple observations for the same phone to be trained. If there are multiple observations for a phone, e.g. if there is a recording of 'dad', 'cat', and 'mat', and the phone 'a', $\theta$, is represented with $\bf{O}=[O_1,O_2,O_3]$, where $\bf{O}_i$ is a different observation from the training set for the same phone $\theta$, how is $\bf{O}$ adjusted? – stock username Dec 29 '15 at 00:40
Phone probability estimation has nothing to do with HMM. – Alexander Solovets Dec 29 '15 at 01:44

Effect of redundant training data in HMM-based speech recognizer/synthesizer?

1 Answers1