Finding conditional expectation of conditional distribution

Question

Let $a \sim N(\mu_a,1/\tau)$, and $s = a + \epsilon$, where $\epsilon \sim N(0,1/\eta)$. I know that because both $a$ and $\epsilon$ is normal distribution, s must also be normally distributed with $s \sim N(\mu_a,\dfrac{\tau +\eta}{\tau\eta})$. $s$ is interpreted as a signal to $a$ that is not observed. Then the conditional expectation of $a$ given $s$ is given by:

\begin{align*} \mathbb{E}[a \mid s] & = \mu_a + \dfrac{cov(a,s)}{var(s)}(s-\mu_a)\\ & = \mu_a + \dfrac{\dfrac{1}{\tau}}{\dfrac{\tau + \eta}{\tau \eta}}(s-\mu_a) \\ & = \dfrac{\tau \mu_a + \eta s}{\tau + \eta} \end{align*}

Consider another $\tilde{s} = a + \tilde{\epsilon}$, where $\tilde{\epsilon} \sim N(0,1/\tilde{\eta})$. This is another signal to $a$, and $\tilde{\epsilon}$ is independent from $\epsilon$. We observe $s$ first, and update the belief, and then observe $\tilde{s}$. I would like to compute the expected value of $a$ given $s$, conditional on $\tilde{s}$.

That is, let $z = a \mid s$ be a conditional distribution of $a$ given $s$. Then I would like to compute $\mathbb{E}[z \mid \tilde{s}]$. I want to use the same formula as above, but I am unsure what $cov(z,\tilde{s})$ is.

I know $cov(z,\tilde{s}) = cov(z,a + \tilde{\epsilon}) = cov(z,a)$. How can I move forward from here?

EDIT: I have learned that the order of the signal does not matter for Bayesian updating. Then what I am really finding is:

\begin{align*} \mathbb{E}[z \mid \tilde{s}] = \mathbb{E}[a \mid s, \tilde{s}] & = \mu_a + \dfrac{cov(a,s)}{var(s)}(s-\mu_a) + \dfrac{cov(a,\tilde{s})}{var(\tilde{s})}(\tilde{s}-\mu_{a})\\ \end{align*}

Is this the correct approach? I don't feel confident, because $s$ and $\tilde{s}$ is correlated and the term above does not include any information regarding that.

EDIT2: Based on the Chris Leite's solution, this is what I understand so far:

\begin{align*} \mathbb{E}[z \mid \tilde{s}] & = \mathbb{E}[a \mid s, \tilde{s}] \\ & = \mathbb{E}[a \mid s'] \text{ where $s' = s + \tilde{s}$} \\ & = \mu_a + \dfrac{cov(a,s')}{var(s')}(s'-\mu_{s'}) \\ & = \mu_a + \dfrac{cov(a,s+\tilde{s})}{var(s+\tilde{s})}(s+\tilde{s}-2 \mu_{a}) \\ & = \mu_a + \dfrac{2var(a)}{var(s) + var(\tilde{s}) + 2cov(s,\tilde{s})}(s+\tilde{s}-2 \mu_{a}) \\ & = \mu_a + \dfrac{2\eta\tilde{\eta}}{\eta \tau + \tilde{\eta} \tau + 4 \eta \tilde{\eta}}(s+\tilde{s}-2 \mu_{a}) \\ \end{align*}

How did you get $E[a|s]=\frac{\tau\mu+\eta s}{\tau+\eta}$, and where is this formula from? I'm getting $\frac{\eta^2s+\tau^2\mu}{\eta^2+\tau^2}$ — Spätzle, Nov 01 '23 at 07:51
All three r.v. $a, s , \tilde{s} $ Have the same mean. You can simplify the problem by substracting this mean. Other than that, I think, that the correlation of s and $\tilde{s} $ becomes irrelevant once you condition on both. The uncertainty of $a$ then only is caused by $\epsilon, \tilde{\epsilon}$ — ChrisL, Nov 01 '23 at 11:57
@Spätzle I have included one more intermediary steps. Regarding the formula, I have seen this in various source, but here is one similar one (https://stats.stackexchange.com/questions/30588/deriving-the-conditional-distributions-of-a-multivariate-normal-distribution) though this is multidimensional. — Hosea, Nov 01 '23 at 19:51
@Xi'an I'm not sure I follow your comment. While I'm not confident if the notation is right, there exists a conditional distribution of a given s, which is itself normal with mean stated in the question, and variance is also easy to compute. So z is a random variable that follows normal distribution. — Hosea, Nov 02 '23 at 13:00
There is a random variable $a$ whose conditional distribution can be derived but this does not turn it into another random variable. — Xi'an, Nov 02 '23 at 13:14

Sextus Empiricus · Answer 1 · 2023-11-02T10:56:47.043

Is this the correct approach? I don't feel confident, because $s$ and $\tilde{s}$ is correlated and the term above does not include any information regarding that.

The $s$ and $\tilde{s}$ are not correlated when you condition on $a$. They are independent distributed according to

$$s|a \sim N(a,1/\eta) \\ \tilde{s}|a \sim N(a,1/\tilde\eta)$$

or if you take both together with inverse variance weighting

$$\frac{\eta s+ \tilde{\eta}\tilde{s}}{\eta+\tilde{\eta}}|a \sim N\left(a,\frac{1}{\eta+\tilde{\eta}}\right)$$

In these three equations, you can regard the parameter $a$ as following a prior distribution

$$a \sim N(\mu_a,1/\tau)$$

and you are finding the posterior distribution after observing $\tilde{s}$ and/or $s$.

$$\begin{array}{lcrcl} a|s &\sim & N(\mu_{a|s},&\sigma_{a|s})\\ a|\tilde{s} &\sim & N(\mu_{a|\tilde{s}},&\sigma_{a|\tilde{s}})\\ a|s,\tilde{s} &\sim & N(\mu_{a|s,\tilde{s}},&\sigma_{a|s,\tilde{s}}) \end{array}$$

That posterior can be found with the updating rules that are derived here:

Bayesian updating with new data

Also very useful is this section about Bayesian inference on the Wikipedia page about the normal distribution.

It's a bit of work to write it down, but two update steps with the independent $s$ and $\tilde{s}$ should give the same result as one single update step with the weighted mean.

You don't need to worry here about correlations between $s$ and $\tilde{s}$. You just have the process of updating the distribution for $a$ based on the distributions in the first three equations. What changes with the sequential updating is that the posterior of the first step is the prior for the second step.

Thank you for your answer! I wasn't aware of the idea of conjugate prior, and I learn something new today. I would like to have a closed-form solution if possible for my application, and I think I have the closed-form solution using the approach given by Chris. — Hosea, Nov 01 '23 at 20:03

ChrisL · Answer 2 · 2023-11-22T16:28:48.817

4

REMARK/ EDIT: This answer does not contain a solution to the problem and was provided as a stepping stone to generate one. It aimed to look for a way to use the symmetry of the problem and led to this question which finally helped to solve the problem.

Assume $ \eta = \tilde{\eta} $ Then since $ \epsilon, \tilde{\epsilon} $ have mean zero are symmetric and have the same variance, you have $$a \stackrel{d}{=} s + \epsilon \stackrel{d}{=} \tilde{s} + \tilde{\epsilon} $$ So you can define $s'$ and $\epsilon'$ such that $$a = \frac{1}{2}(s +\tilde{s}) + \frac{1}{2}(\epsilon +\tilde{\epsilon}) \\ := s' + \epsilon' $$

Now you can calculate $$ \mathbb{E}[a \mid s'] = \mu + \frac{cov(a, s) + cov(a, \tilde{s})}{var(s) + var(\tilde{s}) + 2cov(s, \tilde{s})} (s + \tilde{s} - 2\mu) $$

Because of this post, if $\sigma_{\epsilon} = \sigma_a = \sigma_{\tilde{\epsilon}} $, then $$ \mathbb{E}[a \mid s'] = \mathbb{E}[a \mid s, \tilde{s}]$$

See the accepted answer for the general case where variances differ.

edited Nov 22 '23 at 16:28

answered Nov 01 '23 at 08:06

ChrisL

311

The sum of $\tilde{s}$ and $s$ is a sufficient statistic. Knowing $\tilde{s}$ and $s$ versus knowing just the sum gives the same information about $a$. – Sextus Empiricus Nov 01 '23 at 13:46
@SextusEmpiricus Do you mean that this approach is valid? I have written an edit in the question to incorporate what I have learned from this answer. – Hosea Nov 01 '23 at 19:53
I simply applied the formula of the $E(a|s)$ given in the question to $E(a|s')$ . Note that a is not being defined here. I was only defining s' and $\epsilon'$ – ChrisL Nov 01 '23 at 20:46
@SextusEmpiricus if the sum of $\tilde{s}$ and $s$ is sufficient statistics, isn't $E[a|s,\tilde{s}] = E[a|s']$? Could you comment on why the formula is not valid? – Hosea Nov 01 '23 at 23:45
@Hosea I think the formula is right. It looked a bit alien because of the many terms and the $\mu$ occurs twice. We can write $cov(a,s') = var(a)$ and $var(s') = var(a)+var(\epsilon)/2$, that can be used to simplify the formulas. – Sextus Empiricus Nov 02 '23 at 07:17
I am not convinced that knowing the sum gives as much information as knowing the parts. If we know only the sum $s + \tilde{s} = z$ then $a$ must be close to $z$ with uncertainty $var(\epsilon) + var(\tilde{\epsilon})$. But if we know $s = x$ and $\tilde{s} = y$ with uncertainty $var(\epsilon)$ and $var(\tilde{\epsilon}) $ then we can look at the intersection of the intervals around $x$ and $y$ and get a better estimate of $a$ then from the sum. The estimate improves over the sum when $s$ and $\tilde{s}$ are less correlated – ChrisL Nov 02 '23 at 08:22
If $var(\epsilon)$ and $var(\tilde{\epsilon})$ are different, then the mean of $s$ and $\tilde{s}$ is indeed not a sufficient statistic. But, some other mean, a weighted mean, will be instead the sufficient statistic. I see now that the $s$ and $\tilde{s}$ can have different variance, bit that doesn't change the principle of the problem (just adds more algebraic work). – Sextus Empiricus Nov 02 '23 at 10:45
That sound plausible and i think it deserves to appear as an answer. I think it is interesting enough to be stated as a seperate question that can be used as a refrence. – ChrisL Nov 02 '23 at 11:22
https://stats.stackexchange.com/questions/630269/let-s-x-u-t-x-v-for-normal-r-v-x-u-v-is-it-possible-to-find-a-b-re – ChrisL Nov 02 '23 at 11:45
You seem to ignore the assumptions about precision and implicitly assume $\eta = \tilde\eta.$ The result contradicts the other two answers and is likely to confuse readers. (-1) – whuber Nov 22 '23 at 15:54
@whuber Thank you for pointing that out. I have adapted my answer. However, I gave the answer in order to help before the accepted answer was deduced from my answer here and an answer to this question: stats.stackexchange.com/questions/630269/… . It is incomplete and rather confusing. Should I delete it? – ChrisL Nov 22 '23 at 16:12
2

Options including editing it to note the history; correcting it to reflect your current understanding; deleting it; and letting it stand, hoping readers will go through this comment thread for more information. (I have employed all these strategies with my own answers!) – whuber Nov 22 '23 at 16:15
Thank you for the advice. I decided to add a remark and keep the answer. – ChrisL Nov 22 '23 at 16:40

score 2 · Accepted Answer · answered Nov 07 '23 at 15:36

This answer is adapted from another question. To solve $E[a \mid s, \tilde{s}]$, I first need to figure out the joint distribution of $(a,s,\tilde{s})$. Note that by definition, they are linear combination of $(a,\varepsilon,\tilde{\varepsilon})$.

\begin{align*} \begin{bmatrix} a \\ s \\ \tilde{s} \end{bmatrix} = \begin{bmatrix} 1 & 0 & 0 \\ 1 & 1 & 0 \\ 1 & 0 & 1 \end{bmatrix} \begin{bmatrix} a \\ \varepsilon \\ \tilde{\varepsilon} \end{bmatrix} \end{align*}

Now first note that $s \sim N(\mu,\dfrac{\tau + \eta}{\tau \eta})$ and $\tilde{s} \sim N(\mu,\dfrac{\tau + \tilde{\eta}}{\tau \tilde{\eta}})$. Because $(a,\varepsilon,\tilde{\varepsilon})$ is independent, $(a,s,\tilde{s})$ forms a multivariate normal distribution:

\begin{align*} \begin{bmatrix} a \\ s \\ \tilde{s} \end{bmatrix} \sim N \Bigg( \begin{bmatrix} \mu \\ \mu \\ \mu \end{bmatrix}, \begin{bmatrix} \frac{1}{\tau} & \frac{1}{\tau} & \frac{1}{\tau} \\ \frac{1}{\tau} & \frac{\tau + \eta}{\tau \eta} & \frac{1}{\tau} \\ \frac{1}{\tau} & \frac{1}{\tau} & \frac{\tau + \tilde{\eta}}{\tau \tilde{\eta}} \end{bmatrix} \Bigg) \end{align*}

Then the expectation of $a$ conditional on $s,\tilde{s}$ is given by:

\begin{align*} E[a \mid s, \tilde{s}] & = \mu_a + \begin{bmatrix} \frac{1}{\tau} & \frac{1}{\tau} \end{bmatrix} \begin{bmatrix} \frac{\tau + \eta}{\tau \eta} & \frac{1}{\tau} \\ \frac{1}{\tau} & \frac{\tau + \tilde{\eta}}{\tau \tilde{\eta}} \end{bmatrix}^{-1} \begin{bmatrix} s - \mu_{s} \\ \tilde{s} - \mu_{\tilde{s}} \end{bmatrix} \\ & = \mu + \dfrac{(s-\mu)\tau\eta + (\tilde{s}-\mu)\tau\tilde{\eta}}{(\tau + \eta)(\tau + \tilde{\eta})} \end{align*}

Finding conditional expectation of conditional distribution

3 Answers3