How are redundant training data handled during the training stage?
For example, assume we have one observation for phone $\theta$ in the training set.
Then the training (for a monophone) is done with:
$$\lambda_{max}^\theta = \text{arg} \max_\lambda p(\bf{O}|\lambda) $$
where $\bf{O}$ is the observation sequence and $\lambda$ is the HMM model.
This is straightforward when only a single observation is available from the training set. But what happens when there are multiple observations for phone $\theta$ in the training set? How is $\bf{O}$ adjusted?