1

Let $s_1,s_2,x_1,x_2$ be 4 random variables, where $s_1,s_2\in\{-1,1\}$ are binary, while $x_1,x_2 \in \mathbb R$ are continuous. I want to find the most general form of the distribution $p(s_1,s_2,x_1,x_2)$, subject to the following constrains:

  1. $p(s_1,s_2,x_1,x_2)$ belongs to the exponential family
  2. The marginal $p(x_1,x_2)$ is a bivariate gaussian.
a06e
  • 4,410
  • 1
  • 22
  • 50

1 Answers1

0

Well, write the joint density as $$ p(s_1,s_2,x_1,x_2) = p(x_1,x_2 \mid s_1,s_2) \cdot p(s_1,s_2) \\ = p(x_1,x_2 \mid s_1,s_2) \cdot p(s_2 \mid s_1) \cdot p(s_1) $$ Then the first factor is bivariate normal, conditional on the two discrete variables, multiplied by the point probabilities of the discrete variable. For the discrete variables you could use two binomial distributions (as indicated in the last term above) or maybe a multinomial distribution over the four possible combinations. Taking logarithms in the equation above you can put this into exponential family form.

Depending on your aims, other factorization could be possible, but the above seems the most natural for most purposes.

  • Why the first factor is a bivariate normal? Note that I want $p(x_1,x_2)$ to be normal. I did not require that $p(x_1,x_2|s_1,s_2)$ was normal. In fact, using the form you give, $\sum_{s_1,s_2} P(s_1,s_2,x_1,x_2)$ is not normal in general. – a06e Jul 31 '17 at 11:55
  • Then maybe you want to factorize the other way round, conditioning the discrete variables on the continuous ones? Maybe you should better state your real modeling problem. – kjetil b halvorsen Jul 31 '17 at 11:57
  • @becko - the fact that it's not normal in general, while the factorization is mathematically correct no matter what form $p(.)$ takes, should tell you something about the answer. I strongly suspect there is no solution for which $(x_1, x_2)$ are not independent of $(s_1, s_2)$. A little work with characteristic functions might confirm this... – jbowman Jul 31 '17 at 19:51