3

Concretely, denote $f_{\mu, \sigma}$ for the pdf of a normally distributed random variable with mean $\mu$ and standard deviation $\sigma$. Is it possible that

$$\displaystyle \sum_{i = 0}^n p_i f_{\mu_{1i}, \sigma_{1i}}(x) = \displaystyle \sum_{j = 0}^k q_j f_{\mu_{2j}, \sigma_{2j}}(x) \ \ \ \ \ \sum p_i = \sum q_i = 1; p_i, q_i > 0$$

for $n \neq k$? Intuitively, I'd say yes because if two normals have the same mean, the weighted sum looks like a bell curve with the same mean and different standard deviation, but I have not been able to proof it.

Pel de Pinda
  • 143
  • 4
  • 2
    It's unclear what you are asking, for several reasons. First, your argument applies to sums of random variables, not to mixture distributions, so are you sure you are asking about mixtures? (This is a crucial ambiguity to resolve.) Second, you use identical notations for the distributions: is that intended, or do you posit one set of components for one sum and another potentially different set for the second? Third, are you supposing all the components differ from each other or not? Fourth, are you assuming all the $p_i$ and $q_i$ are nonzero? – whuber Feb 28 '20 at 17:44
  • @whuber Mixtures is what I meant. Why does it not hold for mixtures? Second, that was not intended, the distributions left and right can be different, I fixed the notation. Third, yes. Fourth, yes. Thanks for taking the time to think about my question. – Pel de Pinda Feb 28 '20 at 17:57
  • @whuber or well, not all components different, but I am not interested in the case $p_1 f_{\mu, \sigma} + p_2 f_{\mu, \sigma} = (p_1 + p_2)f_{\mu, \sigma}$. – Pel de Pinda Feb 28 '20 at 18:01
  • 1
    Thank you for the clarifications! Using the techniques at https://stats.stackexchange.com/questions/429868 (and very nearly the same argument), you can show the answer is in the negative: when the two sums are equal, they must both be expressible in identical ways as two Normal mixtures up to the inconsequential differences created by allowing some coefficients to be zero, allowing some of the components to be equal, or re-ordering the indexes. – whuber Feb 28 '20 at 18:07

1 Answers1

7

The answer is no: the mixture determines its components uniquely.

There's an elementary way to show it. The idea is that each component of the mixture determines how the density behaves at extreme values and we can use this asymptotic behavior to identify (and remove) the components one by one.

To carry out this program, consider a Normal mixture

$$f(x) = \sum_{i=1}^n \frac{p_i}{\sigma_i} \phi\left(\frac{x-\mu_i}{\sigma_i}\right)$$

where

$$\phi(z) = \frac{1}{\sqrt{2\pi}} e^{-z^2/2}$$

is the standard Normal density, the $p_i\ne 0$ and $\mu_i$ are any real numbers, the $\sigma_i$ are positive real numbers, and the ordered pairs $(\mu_i,\sigma_i)$ are all distinct. Re-order the indexes $i$ if necessary so that

$$\sigma_1 \le \sigma_2 \le \ldots \le \sigma_n$$

and, whenever $\sigma_i = \sigma_j,$ then $\mu_i \le \mu_j.$ This ordering (the lexicographic ordering) is unique. We're going to examine the last component first (to reduce the length of the sum) and to that end this ordering has us first visit the most spread-out components and, within them, those that are positioned at higher values.

To study what happens to $f$ as $x$ grows large, consider

$$\eqalign{&\frac{\sigma_n}{\phi\left(\frac{x-\mu_n}{\sigma_n}\right)} f(x) \\&= p_n + \sum_{i=1}^{n-1} p_i \frac{\sigma_n}{\sigma_i} \exp\left(\frac{(\sigma_i^2-\sigma_n^2)x^2 - 2x\left(\sigma_n^2\mu_i - \sigma_i^2\mu_n)\right) - \sigma_n^2\mu_i^2 + \sigma_i^2\mu_n^2}{2\sigma_i^2\sigma_n^2}\right). }$$

Because every $\sigma_i^2-\sigma_n^2 \le 0,$ the limiting values of the exponentials in the sum are all $0$ unless the coefficient of the $x^2$ term is zero, which is only the case when $\sigma_i = \sigma_n.$ But in those cases the limiting values are still zero unless the coefficient of $x$ is zero, which (since now $\sigma_i^2=\sigma_n^2$) occurs only when $\mu_i=\mu_n.$ But we have arranged at the outset that this never happens: there is no $i\ne n$ for which $(\mu_i,\sigma_i)=(\mu_n,\sigma_n).$ Thus,

$$\lim_{x\to\infty} \frac{\sigma_n}{\phi\left(\frac{x-\mu_n}{\sigma_n}\right)} f(x) = p_n.$$

Had we used any value other than $\sigma_n$ in this analysis, the limit would have been either $0$ or diverging to $\pm \infty;$ and, having used $\sigma_n,$ using any other value in place of $\mu_n$ also would have produced a limit of $0$ or $\pm\infty.$ In other words, $(\mu_n,\sigma_n)$ is the only parameter for which we can achieve a finite nonzero limit and that limit determines $p_n.$

This shows that any mixture of $n$ distinct Normal densities determines its last component (in the lexicographic ordering of components). Subtracting off this component yields a Normal mixture with one less component, instantly giving an inductive proof of the result:

Let $p_i\ne 0,q_j\ne 0,\sigma_i\gt 0,\tau_j\gt 0,\mu_i,$ and $\nu_j$ be any real numbers for $1\le i \le n$ and $1\le j \le k.$ If every number $x$ in a set with no upper bound satisfies $$ \sum_{i=1}^n \frac{p_i}{\sigma_i} \phi\left(\frac{x-\mu_i}{\sigma_i}\right) = \sum_{i=1}^k \frac{q_i}{\tau_i} \phi\left(\frac{x-\nu_i}{\tau_i}\right)$$ and each sum involves distinct Normal components and is ordered lexicographically, then $n=k$ and for each $1\le i\le n,$ $p_i=q_i,$ $\mu_i=\nu_i,$ and $\sigma_i=\tau_i.$

(I phrased this in a way that indicates how this analysis can be generalized to some other families of distributions, including discrete distributions.)

Notice we never had to assume the $p_i$ were positive or that they had to sum to unity. In fact, we didn't even have to assume $n$ and $k$ are finite! In the countably infinite case, all that's needed to carry the demonstration through is that the sets of $\sigma_i$ and $\tau_j$ are bounded above, have only one accumulation point each, and that each set of $\mu_i$ and $\nu_j$ has at most one accumulation point in the extended reals.

whuber
  • 322,774
  • is there a textbook citation of this anywhere? – Jimmy TwoCents Dec 09 '22 at 22:00
  • @JimmyTwoCents I haven't researched that. – whuber Dec 09 '22 at 22:06
  • @Jimmy A couple of years later, Christian Hennig identified a reference in a post at https://stats.stackexchange.com/a/594181/919. – whuber Sep 05 '23 at 14:16
  • 1
    Nice proof! Cheney & Light, Lemma 1 in Chap 13 state and prove a similar result for the simplest exponential case. I guess that we can similarly claim that a finite family of Gaussian densities with distinct parameters are linearly independent functions on any domain that has a point of accumulation. – Yves Nov 08 '23 at 07:32
  • is the above true when we have the gaussian components in 2 dimensional? I mean there we will have covariance so how we will order the covariance? – Andyale Dec 09 '23 at 16:10
  • @Andyale As I wrote in a comment to your related question, because the multidimensional Normal distribution is one in which all linear combinations are univariate Normal, there's nothing more to be shown in the multidimensional case. – whuber Dec 09 '23 at 17:48
  • @whuber one more thing, I mean is the statement: " the multidimensional Normal distribution is one in which all linear combinations are univariate Normal " always true? or do we need something like the dataset from which we are getting the GMM should be not correlated? – Andyale Dec 14 '23 at 10:53
  • @Andyale I can't follow the second part of your question because the two things you mention do not appear related: one is about a theoretical distribution and another has something to do with datasets. As far as the first part goes, see https://stats.stackexchange.com/questions/4364 for this and other characterizations of multivariate Normal (aka Gaussian) distributions. – whuber Dec 14 '23 at 15:06
  • @whuber the second part was like suppose we have the dataset ${X_1,..., X_n}$ and we are making the GMM out of it so if suppose there is a correlation between these datasets then can we still say: " the multidimensional Normal distribution is one in which all linear combinations are univariate Normal " – Andyale Dec 18 '23 at 04:50
  • @Andyale I don't understand why any mathematical statement about multivariate Normal distributions would be contingent on data at all. – whuber Dec 18 '23 at 14:04
  • @whuber I was just concerned about the practical implementation, so these types of cases arise in computation, there we fit the gmm to entire data only. – Andyale Dec 20 '23 at 06:44
  • https://projecteuclid.org/journals/annals-of-mathematical-statistics/volume-39/issue-1/On-the-Identifiability-of-Finite-Mixtures/10.1214/aoms/1177698520.full (1968) and https://www.jstor.org/stable/2238337 (1963) look to be early resources – Jimmy TwoCents Dec 28 '23 at 16:36
  • The second one is a follow-up to the paper @whuber mentions in their comment by the same author – Jimmy TwoCents Dec 28 '23 at 16:50