6

Why are hidden Markov models (HMM) called mixture models? What does it mix?

Franck Dernoncourt
  • 46,817
  • 33
  • 176
  • 288

2 Answers2

6

Mixture models are generic probability density functions which are the weighted sums of independent processes that add to a total density function with a total area of 1, which area is common to all probability density functions. Consider, for example that two people are cutting pencils on an assembly line. The first cuts a fraction $0<p<1$ of the pencils with an average pencil length of $\mu_1$ with a standard deviation of $\sigma_1$. A second person is cutting $1-p$ of the pencils with average pencil length of $\mu_2$ with a standard deviation of $\sigma_2$. Then the mixture distribution (normal distribution assumption) of pencils coming off of the assembly line is $MD(p,\mu_1,\sigma_1,\mu_2,\sigma_2)=pN(\mu_1,\sigma_1)+(1-p)N(\mu_2,\sigma_2)$.

In a hidden Markov model, the state (pencil cutters) is not directly visible, but the output (e.g., assembly line output), dependent on the state, is visible. Each state has a probability distribution over the possible output tokens ($p$ and $1-p$ in our case). Now a hidden Markov model does not have to be a mixture model, for example, it can be unimodal, but the mixture model type of hidden Markov model is simple to solve.

To better explore if, as claimed in Wikipedia, a hidden Markov model can be considered a generalization of a mixture model or whether that is just too narrow a view, I posed this as a separate question; Are there any examples of hidden Markov models that are not mixture models? And as it turns out convolutions can be HMM as well, and most people would consider convolution to be a different operation from mixture addition.

It would seem that HMM are not only useful for mixture models, but for convolution models and possibly others.

Carl
  • 13,084
  • 6
    And to avoid any confusion: "Why are hidden Markov models (HMM) also called mixture models?" HMMs are NOT called mixture models. Mixture models and HMMs are 2 different models but as explained above, HMMs can embed mixture models as emission distributions. – Eskapp Dec 08 '16 at 16:04
  • 1
    @Eskapp,thanks ,as I read on wiki- [link] (https://en.wikipedia.org/wiki/Mixture_model) , mixture models are exactly what Carl explained and yes, it's only a part of HMM as u rightly said. – Arpit Sisodia Jan 28 '17 at 04:40
  • Convolution is exactly the sum law of probability distributions. https://en.m.wikipedia.org/wiki/Convolution_of_probability_distributions#:~:text=The%20convolution%2Fsum%20of%20probability,linear%20combinations%20of%20random%20variables.

    Does it mean it's exactly the same as the mixture addition?

    – Brian Cannard Sep 11 '23 at 03:23
  • @BrianCannard Convolution of probability distributions, i.e., probability density functions, is equal to their convolution integral, e.g., see infinite support; Fourier transforms for one type of convolution integral. There are two others, real space convolution and Laplace transforms, the latter for semi-infinite support densities. – Carl Sep 11 '23 at 06:10
  • 1
    @BrianCannard More generally, density functions do not have to be probability density functions (pdf). In general, density functions add by convolution, and a mixture model is the scaled ordinary addition of density functions, which is not convolution. Convolution of density functions is not ordinary addition, mixture models are ordinary addition. It may be hard to grasp what convolution is, but trust me on this, it is not mixture modeling. – Carl Sep 11 '23 at 06:21
  • Excellent intuition! Convolutions feel more natural and general, actually. What is the justification for addition of scaled densities in mixture models? – Brian Cannard Sep 12 '23 at 20:59
  • @BrianCannard The physics. Suppose for two machines one cuts pipes slightly shorter than the other but both dump pipes onto the same conveyor belt. The conveyor belt then goes to a sorting room. In the sorting room they want to know how many pipes are more than or less than a specific length. They plot a histogram and discover that the pipe lengths are distributed as a mixture model with two distinct peaks for pipe lengths. – Carl Sep 14 '23 at 00:59
-2

(This answer would be better as a comment to build on @Eskapp's comment)

I think it is important to give the general and simple formula $$p(Y) = \sum_{X} p(X,Y) = \sum_{X} p(X)p(Y|X)$$ (also appearing on Wikipedia). This clearly shows that in HMM, it is the observation process ($Y$) which is modeled as a mixture.

However, as already noted, HMM are not called mixtures models.

TheCG
  • 1,147