2

For simplicity assume that $X,Y$ are discrete, finite, random variables, with joint distribution $P_{XY}(x,y) = \mathbb{P}(X=x\wedge Y=y)$.

Now suppose that we do not know $P_{XY}(x,y)$, but are given the values of the marginal $P_X(x)=\sum_y P_{XY}(x,y)$ and the conditional $P_{X|Y}(x|y)=P_{XY}(x,y)/P_Y(y)$.

Is the knowledge of $P_X(x)$ and $P_{X|Y}(x|y)$ enough to recover the full joint distribution $P_{XY}(x,y)$?

Please note that this is different from Is the joint distribution $P_{XY}(x,y)$ determined from the conditionals $P_{X|Y}(x|y)$ and $P_{Y|X}(y|x)$?, because there I know the two conditionals, whereas here I know a conditional and a marginal.

a06e
  • 4,410
  • 1
  • 22
  • 50

1 Answers1

1

The marginal $P_X(x)$ can be found by summing (or integrating for continuous variables) the conditional $P_{X|Y}(x|y)$. Or in different words: the marginal probability of $P_X(x)$ is a sort of mixture of the conditional probabilities $P_{X|Y}(x|y)$ (at different values of $Y$) with the weights determined by the probability $P_Y(y)$.

$$P_X(x) = \sum_{\forall Y} P_{X|Y}(x|y)P_Y(y)$$

Since there may be multiple different $P_Y(y)$ that can lead to the same $P_X(x)$ for a given $P_{X|Y}(x|y)$ the information of $P_X(x)$ and $P_{X|Y}(x|y)$ can not be used to calculate backwards the $P_Y(y)$.

The simplest case is when $P_{X|Y}(x|y) = P_X(x)$ in which case any $P_Y(y)$ will be compatible.


For discrete variables (and you could extend the logic to continous variables), you could consider $P_X(x) = \sum_{\forall Y} P_{X|Y}(x|y)P_Y(y)$ as a matrix equation:

$$\begin{bmatrix} P_X(a_1) \\ P_X(a_2) \\ \vdots \\ P_X(a_n) \end{bmatrix} = \begin{bmatrix} P_{X|Y}(a_1|b_1) & P_{X|Y}(a_1|b_2) & \dots & P_{X|Y}(a_1|b_n) \\ P_{X|Y}(a_2|b_1) & P_{X|Y}(a_2|b_2) & \dots & P_{X|Y}(a_2|b_n) \\ \vdots & \vdots & & \vdots\\ P_{X|Y}(a_n|b_1) & P_{X|Y}(a_n|b_2) & \dots & P_{X|Y}(a_n|b_n) \\ \end{bmatrix} \cdot \begin{bmatrix} P_Y(b_1) \\ P_Y(b_2) \\ \vdots \\ P_Y(b_n) \end{bmatrix} $$

So when the $P_{X|Y}(x|y)$, when considered as vectors (a different one for each value of $y$) are linearly independent, then you can obtain $P_Y(y)$ from $P_X(x)$ and $P_{X|Y}(x|y)$.

This is a sufficient condition but not necessary. The additional restriction that all $P_Y(y)>0$ might make it possible that also linear dependent vectors might still result in a unique solution for $P_Y(y)$.


Example. When $P_{X|Y}(x|y) \sim N(\mu = y,\sigma = 1)$ then you can not write any of the conditional probabilities as a sum of the others and you should be able to recover $P_Y(y)$ when you know $P_X(x)$ (by some deconvolution).