Mixture probability depends on the sample

Question

A mixture of two distributions has density which is the weighted sum of the components: $$f_{mix}(x) = p f_{1}(x) + (1-p) f_{2}(x).$$ What if the mixture weight is allowed to vary with the sample point? $$f_{mix}(x) = p(x) f_{1}(x) + (1-p(x)) f_{2}(x).$$ That is, you need to know the sample $x$ before you can determine what mixture it came from. It's unclear how to draw from this distribution, and in fact this is not a distribution, as $f_{mix}$ will not in general sum to 1.

Nonetheless it seems interesting. For example, I can create a skewed distribution by "switching" from one normal distribution the other, both centered on 0: $$f_{mix}(x) = \frac{1}{Z} \left[\text{sigmoid}(x) N(x; \sigma_{1}) + (1-\text{sigmoid}(x))N(x; \sigma_{2})\right].$$ Is there a name for this idea? Can this be realized as a physical model, perhaps under some conditions?

Try mixture of experts for a connected concept. – Xi'an Mar 16 '22 at 14:09 — Xi'an, Mar 16 '22 at 14:09

score 1 · Answer 1 · answered Mar 07 '22 at 22:36

1

That is not a distribution at all --- the purported density function does not integrate to one. In general, when you have a proper mixture distribution (i.e., one that uses fixed weights) you can rest assured that:

$$\begin{align} \int \limits_\mathbb{R} f_\text{mix}(x) \ dx &= \int \limits_\mathbb{R} [ p f_1(x) + (1-p) f_2(x) ] \ dx \\[6pt] &= p \int \limits_\mathbb{R} f_1(x) \ dx + (1-p) \int \limits_\mathbb{R} f_2(x) \ dx \\[6pt] &= p + 1-(p) =1. \\[6pt] \end{align}$$

However, using your proposed change you get:

$$\begin{align} \int \limits_\mathbb{R} f_\text{improper mix}(x) \ dx &= \int \limits_\mathbb{R} [ p(x) f_1(x) + (1-p(x)) f_2(x) ] \ dx \\[6pt] &= \int \limits_\mathbb{R} p(x) f_1(x) \ dx + \int \limits_\mathbb{R} (1-p(x)) f_2(x) \ dx, \\[6pt] \end{align}$$

which does not equal one in general. What you are proposing here would require you to add a scaling constant to your density to make it integrate to one. That just leads you back to the same level of generality you have when you start by saying that you can form a density function by scaling any non-negative function with a finite integral. In view of this, it is not really clear what utility your proposed method has.

answered Mar 07 '22 at 22:36

Ben

124,856

2

I think you're responding to typos in the question rather than substance. Suppose we have, e.g., a one parameter family of response distributions $F_\theta$ and suitable link functions $h$ and $g.$ We might posit a model for the response $Y_x$ in the form of a mixture where the conditional distribution of $Y_x$ is of the form $$Y_x\sim g(\alpha_0+\alpha_1x)F_{h(\beta_0+\beta_1x)}+(1-g(\alpha_0+\alpha_1x))F_{h(\gamma_0+\gamma_1x)}$$ with six parameters $\alpha_,$ $\beta_,$ and $\gamma_$ to be estimated. (Although this works,* it looks like math in search of a problem rather than stats.) – whuber Mar 07 '22 at 23:48
That's quite a generous interpretation of "typos". Even in that model, it's not clear that the function you build will be a valid CDF (i.e., non-decreasing). – Ben Mar 08 '22 at 03:25
@whuber you're right, this is math in search of a problem; I was thinking of this as a model that's "easy to infer, hard to draw from," as opposed to the usual other way around. I'll look into the example you suggest.
@.Ben you are correct that $f_{mix}$ does not sum to 1, something which I acknowledge in my question. This feels more motivated than scaling any non-negative integrable function, but maybe you're right that it isn't. I'll amend the question to add some "motivation."
– Mark Perlman Mar 08 '22 at 16:01
1

For any fixed $x$ the right hand side is obviously a CDF because it's a mixture of two CDFs, provided only that the range of $g$ is a subset of $[0,1],$ which is part of what I meant by "suitable link." This is an interpretation of the text of the question, as suggested (somewhat inaccurately) by its last formula: it's a mixture where the "mixture probability" $g$ depends on $x$ (the "sample"). – whuber Mar 08 '22 at 18:50
The whole point of the question is that it's not a fixed $x$ --- that is the argument variable. Glossing over that difficulty with the magical words "suitable link" does not make this setup even slightly helpful. What you are putting forward is also not a valid interpretation of the question --- it is a complete rewriting of it to a completely different setup. – Ben Mar 08 '22 at 21:16
1

I am using the phrase "fixed $x$" to indicate (i) $x$ could be anything but (ii) we're not discussing a conditional probability distribution. Thus, in the usual, familiar, sense $x$ is an explanatory variable. This is a standard textbook regression problem. The twist is that the OP proposes a family of mixture distributions as the conditional response. MLE will work, but because it's difficult to get it to work well even when $x$ is constant, this situation would be of interest only with large amounts of data. – whuber Mar 09 '22 at 15:56

Mixture probability depends on the sample

1 Answers1