What is $p(y|x)$ given $X=Y+Z$, $Z$ is a standard normal, and $Y$ is a random variable

Question

We have $X=Y+Z$ where $Z$ is a standard normal and $Y$ is a random variable with $p(y)$ as its density. $Y$ and $Z$ are independent.

The conditional probability $p(x|y)$ is obvious to be $\mathcal N_x(y, 1)$, a normal with mean y and deviation one. My question is what is $p(y|x)$?

According to $Y=X-Z$, $p(y|x)$ should be $\mathcal N_y(x,1)$.
According to the Bayesian equation, we have $p(y|x)=p(x|y)\times p(y)/p(x)$, which means that $p(y|x)$ is proportional to the product of $p(x|y)$ and $p(y)$ and not necessarily a normal.

Which one is correct?

I tend to believe the Bayesian equation but cannot formally prove that $Y=X-Z$ does not hold.

@Henry Given the equation X=Y+Z, I assume they are dependent — ryushinn, Aug 03 '23 at 12:54
@ryushinn "Given the equation $X = Y + Z$, I assume they are dependent" makes no sense -- $Y$ and $Z$ can either be dependent or independent. And your very first claim "the conditional probability $p(x|y)$ is obviously to be $N(y, 1)$" is obviously wrong when $Y$ and $Z$ are dependent. For an extreme example, just consider $Y = -Z$. — Zhanxiong, Aug 03 '23 at 12:58
Ah, I thought you started with $Y$ and $Z$ independent. But do you define the dependence between $Y$ and $Z$ via $Y = Z-X$? — Ute, Aug 03 '23 at 12:58
@Zhanxiong, You are right. I thought Henry was asking "Are X and Y independent?". Let me correct my statement: Given X=Y+Z, X and Y are dependent, but we don't know the dependency between Y and Z. For my question, I just assume Y and Z are independent. In other words, Z is an independent gaussian noise added to Y — ryushinn, Aug 03 '23 at 13:03
Your 1 is wrong, while your 2 is the better way to think. If $X=Y+Z$ is relatively large then I would intuitively think it is more likely that both $Y$ and $Z$ are both relatively large than $Y$ is relatively small and $Z$ even larger (or that $Z$ is relatively small and $Y$ even larger) though this would depend on the individual distributions. — Henry, Aug 03 '23 at 13:26
ryushinn, you are constantly editing your question - this is OK, because you want to get hold of a difficult concept and had to make your question more precise. You need a dialogue rather than an answer (and that is also, what you get here), but please use the self-study tag next time you ask such questions. This prevents other users such as me to write long answers, and signals to others with questions that they should read the comments :-) I adjusted my answer to your last version where you did not want to decide if $Y$ and $Z$ are independent, to show how important that decision is. — Ute, Aug 03 '23 at 14:06

Ute · Answer 1 · 2023-08-03T14:22:46.400

[Edit: I assumed that $Y$ and $Z$ are independent, but this was not clear from the first version of this question]

If $X$ and $Z$ were independent, then indeed the conditional distribution of $Y = X-Z$ given $X$ would be normal. But $X$ and $Z$ are only independent if $Y$ is a normal random variable.

Think of a very extreme example: that $Y$ only takes one possible value, say $P(Y=5)=1$. Then $X\sim N(5, 1)$ but $P(Z= X+5)=1$, so there is no option for $Z-X$ to vary, and $Z-X=5$ with probability 1.

Edits in response to comments and edits:

In summary: Your question cannot be decided without assuming a dependence structure.

[For clarification: there was a discussion about $Z$ and $Y$ being independent or not. The initial version did not state that $Z$ and $Y$ actually are independent, and there were doubts on that in the comments. I keep this part of the answer because it highlights the importance of specifying the dependence structure]

Since you specified the distribution of $Z$ (standard normal) and $Y$ (some distribution) in the start, I assumed that $Z$ and $Y$ are independent, and $X$ is defined by $Z$ and $Y$. Following @Henry's comments, you stated that you don't assume anything about the dependence. However, conditional distribution of $Y$ given $X$ really depends on how the variables relate to each other - their joint distribution.

In my example where $Y$ only takes one possible value, $Y$ would be automatically independent of both $Z$ and $X$ .
On the contrary, you can also determine $Y$ by specifying $X$. For example, you can require that $X$ is constant, say $P(X=10)=1$. Then you would get that $Y\mid X = 10-Z$ is standard normal (because $-Z\sim Z$.)

There is not only one distribution of $Y$ given $X$ in general, without knowing the dependence. The dependence comes in to the Bayesian formula when you calculate expressions like $P(Y=y, X=x)$ in the denominator, because $$ \begin{aligned} P(Y=y | X=x) &= \frac{P(Y=y \wedge X=x)}{P(X=x)} \quad \small\text{(joint distribution of $X$ and $Y$ needed)} \\&= \frac{P(Y=y \wedge Z=x-y)}{P(Z=x-Y)}\quad \small\text{(joint distribution of $Y$ and $Z$ needed)} \\&= \frac{P(X=x \wedge Z=x-y)}{P(X=x)}\quad \small\text{(joint distribution of $X$ and $Z$ needed)} \end{aligned} $$ [Caveat: This is not a rigorous formula for continuous random variables, just to illustrate the principle, for sake of simplicity]

In the numerator of Bayes formula, you always have some joint events, saying something about [$Y$ and $X$] or [$Y$ and $Z$] or [$X$ and $Z$]. If you want to get anywhere solving the equation, you need to know the joint distribution of two of the variables. This is easiest if you assume two of them to be independent (such as I did when I read your first version of this question).

score 1 · Answer 2 · answered Aug 03 '23 at 12:49

If $p(y)$ is your prior probability for $y$, then your posterior probability should be

$$p(y|x)=\frac{p(y)\phi(y-x)}{\int p(u)\phi(u-x)du}$$

where $\phi$ is the pdf of probability for a standard normal variable.

If you don’t have a prior probability for $y$, then using the above formula with a constant $p(y)$ gives just $$p(y|x)=\phi(y-x)$$ which is the equality that you gave between $p(y|x)$ and $N(x,1)$.

score 1 · Answer 3 · answered Aug 03 '23 at 13:22

Let's say $Y$ is the number of apples in a basket, $Z$ is the number of pears, and $X$ is the total number of fruit. $p(Y|X)$ then describes the probability distribution over the number of apples, given that you know the total number of fruit. And we know that the number of pears follows a unit-variance Normal distribution (let's imagine it has mean 100 though instead of 0, so that negative numbers of pears are vanishingly unlikely).

Now someone tells us that there are 150 fruit in the basket. Based only on our knowledge about the distribution of pears, our belief about the number of apples would be Normal (with mean 50 and variance 1), since the number of apples follows from subtracting the number of pears (which is unknown and Normally distributed) from the fixed sum total.

However, this way of thinking does not incorporate our prior knowledge about the number of apples in the basket. This is where $p(Y)$ comes in. If we assume $p(Y)\propto \text{Constant}$, i.e. a Uniform prior saying all numbers of apples are equally likely a priori, then our posterior $p(Y|X)$ will remain Normal. If we assume that $p(Y)$ is itself Normal, then the resulting posterior will also be Normal, but with variance smaller than 1. For any other choice of $p(Y)$, the posterior will not be Normal.

For instance, if we know that the number of apples must always be between 0 and 10 (i.e. $Y\sim \mathcal{U}(0,10)$), then our posterior will also be non-zero only over that interval, and will look like the clipped left tail of a Normal distribution (with maximum density at $Y=10$). Of course, finding a total of 150 fruit in your basket when you know there can be at most 10 apples and the number of pears is close to 100 is very unlikely, but given that unlikely outcome, that's what the inference looks like.

Note that this in no way violates $X=Y+Z$. In the above example, even if the number of pears $Y$ were only 5, we could still find a value of $Z$ (145 apples) to make up the difference and get to 150 fruit. That value happens to be very unlikely under Z's prior distribution, but that doesn't matter - given that $X$ is what it is, that simply necessitates that $Z$ took an unlikely value.

Henry · Accepted Answer · 2023-08-03T13:33:54.270

The simple point is that $X$ is not independent of $Y$ or of $Z$, and so $Y$ is not independent of $X$ and $Z$ is not independent of $X$.

What you see is what I would expect: if $X=Y+Z$ is relatively large then I would intuitively think it is more likely that both $Y$ and $Z$ are relatively large than $Y$ is relatively small and $Z$ even larger (or that $Z$ is relatively small and $Y$ even larger) though this would depend on the individual distributions.

Two examples:

Suppose $Y \sim N(0,1)$ independently and identically to $Z$. Then $X =Y+Z \sim N(0,2)$. Conditioned on $X=x$ you will have $Y \sim N\left(\frac{x}2, \frac12\right)$ and $Z$ with the same conditional distribution. The shift of the conditional mean from $0$ to $\frac{x}2$ seems obvious by the exchangeability of $Y$ and $Z$ and the need to divide their sum between them.
Suppose $Y=\pm k$ with equal probability, independently of $Z$. Conditioned on $X=x$ you will have $\mathbb P_{Y\mid X=x}(Y=+k) = \frac{\phi(x-k)}{\phi(x-k)+\phi(x+k)}$ which is an increasing function of $x$, close to $0$ for large negative $x$ and close to $1$ for large positive $x$.

You make me realize that I ignore the dependency of $X$ and $Z$ when looking at $Y=X-Z$. Here given $X=x$, $Z$ is no longer a standard normal but $p(z|x)$. I appreciate your example 1 especially, which is good to think about. — ryushinn, Aug 03 '23 at 14:44

What is $p(y|x)$ given $X=Y+Z$, $Z$ is a standard normal, and $Y$ is a random variable

4 Answers4

Edits in response to comments and edits: