Is the joint distribution $P_{XY}(x,y)$ determined from the marginal $P_X(x)$ and the conditional $P_{X|Y}(x|y)$?

Question

For simplicity assume that $X,Y$ are discrete, finite, random variables, with joint distribution $P_{XY}(x,y) = \mathbb{P}(X=x\wedge Y=y)$.

Now suppose that we do not know $P_{XY}(x,y)$, but are given the values of the marginal $P_X(x)=\sum_y P_{XY}(x,y)$ and the conditional $P_{X|Y}(x|y)=P_{XY}(x,y)/P_Y(y)$.

Is the knowledge of $P_X(x)$ and $P_{X|Y}(x|y)$ enough to recover the full joint distribution $P_{XY}(x,y)$?

Please note that this is different from Is the joint distribution $P_{XY}(x,y)$ determined from the conditionals $P_{X|Y}(x|y)$ and $P_{Y|X}(y|x)$?, because there I know the two conditionals, whereas here I know a conditional and a marginal.

closely related: https://stats.stackexchange.com/q/427510/5536 — a06e, Sep 16 '19 at 18:29
You have an incorrect formula for the marginal. When you use a correct one, it should become apparent that the answer is in the negative. Consider the case of independent variables, for instance: your conditions tell you nothing whatsoever about the distribution of $Y.$ — whuber, Sep 16 '19 at 18:33
Of course. That's an obvious counter-example. If you move your comment to an answer I'll accept it. Thanks. — a06e, Sep 16 '19 at 18:51
@whuber Sorry, I don't see why my formula for the marginal is wrong? — a06e, Sep 17 '19 at 17:28
I apologize: I misread it as a sum of conditional probabilities. As a sum of joint probabilities it's perfectly correct. — whuber, Sep 17 '19 at 17:31
To complete the two linked questions here is one example of the third flavour: Is it possible to derive joint probabilities from marginals with assumptions about the conditionals? — Sextus Empiricus, Sep 18 '19 at 21:57

Sextus Empiricus · Accepted Answer · 2019-09-18T21:42:59.620

The marginal $P_X(x)$ can be found by summing (or integrating for continuous variables) the conditional $P_{X|Y}(x|y)$. Or in different words: the marginal probability of $P_X(x)$ is a sort of mixture of the conditional probabilities $P_{X|Y}(x|y)$ (at different values of $Y$) with the weights determined by the probability $P_Y(y)$.

$$P_X(x) = \sum_{\forall Y} P_{X|Y}(x|y)P_Y(y)$$

Since there may be multiple different $P_Y(y)$ that can lead to the same $P_X(x)$ for a given $P_{X|Y}(x|y)$ the information of $P_X(x)$ and $P_{X|Y}(x|y)$ can not be used to calculate backwards the $P_Y(y)$.

The simplest case is when $P_{X|Y}(x|y) = P_X(x)$ in which case any $P_Y(y)$ will be compatible.

For discrete variables (and you could extend the logic to continous variables), you could consider $P_X(x) = \sum_{\forall Y} P_{X|Y}(x|y)P_Y(y)$ as a matrix equation:

$$\begin{bmatrix} P_X(a_1) \\ P_X(a_2) \\ \vdots \\ P_X(a_n) \end{bmatrix} = \begin{bmatrix} P_{X|Y}(a_1|b_1) & P_{X|Y}(a_1|b_2) & \dots & P_{X|Y}(a_1|b_n) \\ P_{X|Y}(a_2|b_1) & P_{X|Y}(a_2|b_2) & \dots & P_{X|Y}(a_2|b_n) \\ \vdots & \vdots & & \vdots\\ P_{X|Y}(a_n|b_1) & P_{X|Y}(a_n|b_2) & \dots & P_{X|Y}(a_n|b_n) \\ \end{bmatrix} \cdot \begin{bmatrix} P_Y(b_1) \\ P_Y(b_2) \\ \vdots \\ P_Y(b_n) \end{bmatrix} $$

So when the $P_{X|Y}(x|y)$, when considered as vectors (a different one for each value of $y$) are linearly independent, then you can obtain $P_Y(y)$ from $P_X(x)$ and $P_{X|Y}(x|y)$.

This is a sufficient condition but not necessary. The additional restriction that all $P_Y(y)>0$ might make it possible that also linear dependent vectors might still result in a unique solution for $P_Y(y)$.

Example. When $P_{X|Y}(x|y) \sim N(\mu = y,\sigma = 1)$ then you can not write any of the conditional probabilities as a sum of the others and you should be able to recover $P_Y(y)$ when you know $P_X(x)$ (by some deconvolution).

Is the joint distribution $P_{XY}(x,y)$ determined from the marginal $P_X(x)$ and the conditional $P_{X|Y}(x|y)$?

1 Answers1

Linked