Distribution of a conditional expectation

Question

I am interested in calculating the distribution of a conditional expectation, I would like some help figuring out if I am on the right track.

Consider a variable $x\sim f(x)$ and a noisy signal $s$ of $x$ drawn from $g(s;x)$. Consider $E(x|s)$ computed using Bayes' rule. I am interested in the distribution of all possible values of $E(x|s)$ which I will call $q(z)$, which I think I can define as $$ q(z) = \int 1(E(x|s)=z) g(s;x)ds $$ where 1 is the indicator function. My question is whether my definition of $q(z)$ makes sense or if there is a better/conventional way to define it. I am somewhat confused because the law of iterated expectations states that $E_sE(x|s)=E(x)$ but this should only imply $E(q(z))=E(f(x))$, correct?

To help understanding what I am after, for example, consider $x\sim U[0,1]$ and $s$ one Bernoulli draw with parameter $p=x$. Then using Bayes' rule $E(x|s=1)=2/3$ and $E(x|s=0)=1/3$ (the prior is a Beta(1,1), the two posteriors are Beta(2,1) and Beta(1,2) with averages 1/3 and 2/3). The two values of the conditional expectations which are equally likely because of the symmetry of f(x), therefore
$$q(x) = \begin{cases} 1/2 & \text{if } x=1/3 \\ 1/2 & \text{if } x=2/3 \\ 0 & \text{otherwise} \end{cases}$$

"a noisy signal of drawn from (;)" what is this 's of x'? What is $g(x;s)$, the distribution of $x$ given $s$? How do you use it to draw the noisy signal $s$? What is $x^\prime$? — Sextus Empiricus, Jun 05 '23 at 15:14
Are you looking for the distribution of the mean posterior estimate and why this does not equal the prior? — Sextus Empiricus, Jun 05 '23 at 15:51
@SextusEmpiricus yes I am looking for the distribution of E(x|s), it does not equal the prior, just look at my example. The posterior E(x|s) can take only two values when the signal takes two values, but the prior is continuous. — andrea m., Jun 05 '23 at 16:02
"Why does the mean posterior estimate not equal the prior?" So that is your question? — Sextus Empiricus, Jun 05 '23 at 16:04
No my question is whether my expression for q make any sense (I edited it as q(z) for better clarity). My hunch now is that there should be a double integral, integrating over both x and s — andrea m., Jun 05 '23 at 16:07
Your $g(s;x)$ is not so clear to me, but the second integral might be contained in the $E$ operator, the expectation. Or is can be in the expression $\text{d} s$ when $s$ is measured as the distribution of $s$ given the prior of $x$. — Sextus Empiricus, Jun 05 '23 at 16:13
Again in my example, g(s|x)~Bernoulli(x), therefore s = 1 with probability x and 0 otherwise. Higher values of x are more likely to generate s=1 and the resulting posterior are 1/3 when s=0 and 2/3 when s=1. Given the distribution of x and s, How frequently will I observe posteriors 1/3 and 2/3? In this example it's all symmetric, so it will be 1/2 - 1/2, but what would be the analytic expression for computing such frequencies in general? — andrea m., Jun 05 '23 at 16:19
The equation$$q(z) = \int \mathbb I_{\mathbb E(x|s)=z} g(s;x),\text ds$$is incorrect for defining the marginal density of $\mathbb E(x|s)$. — Xi'an, Jun 05 '23 at 19:14
Thanx @Xi'an, what do you propose? It's really the marginal p(s) but its argument should be z=E(x|s) — andrea m., Jun 05 '23 at 19:55

Sextus Empiricus · Answer 1 · 2023-06-05T18:58:07.130

Possibly this question may help: Bayes' Theorem Intuition

In my answer there I used the following figure:

Bayes rule can be viewed as taking a slice out of the joint distribution and regarding the conditional distribution.

Obviously the mean of $P(B)$ and $P(B|A)$ are not the same. But we can express $E(B)$ by summing/integrating over all the expectations of $E(B|A)$ (the expectation of $B$ given $A$).

You could relate the two as follows

$$E(B) = \int_{\forall a} E(B|A) f_A(a) \text{d}a \tag{1}$$

And with

$$E(B|A) = \frac{\int_{\forall b} b f(b,a) \text{d}b}{\int_{\forall b} f(b,a) \text{d}b}$$

and

$$f_A(a) = \int_{\forall b} f(a,b) \text{d}b$$

if you substitute them into (1) you get

$$\begin{array}{}E(B)& = &\int_{\forall a} \frac{\int_{\forall b} b f(a,b) \text{d}b}{\left[\int_{\forall b} f(a,b) \text{d}b\right]} \left[\int_{\forall b} f(a,b) \text{d}b \right]\text{d}a\\& = &\int_{\forall a} \int_{\forall b} b f(a,b) \text{d}b\text{d}a \end{array} $$

If you want to define something like the distribution of $E(B|A)$ then you could consider the transformation rules for probability distributions which scales the density with the derivate of the transformation (as well as sums over a potential multiple cases of $a$ that match the same mean):

$$f_{E(B|A)}(e) = \sum_{\forall a:E(B|A)=e} f_{A}(a) \left|\frac{\text{d}a}{\text{d}e}\right|$$

This might be what you were trying to do with the integral including the indicator function.

thanks yes this last thing is what I was trying to do – andrea m. Jun 06 '23 at 09:31 — andrea m., Jun 06 '23 at 09:31

score 4 · Answer 2 · edited Jun 05 '23 at 15:05

4

I would work through this problem as follows:

$$\langle x| s \rangle = \int x \cdot p(x|s) dx$$

I think this is different to what you've done, you seem to be integrating against s but any expression for $\langle x| s \rangle$ must have explicit dependence on s so you can't have s dependence integrated out

Then you can use Bayes to write this as

$$\langle x| s \rangle = \int x \cdot \frac{p(s|x)p(x)}{p(s)} dx$$

where I just used $p$ to generically denote "the pdf of", as per the usual Bayes' theorem notation. By your definitions, $p(x)=f(x)$ and $p(s|x)=g(x;s)$

Also, by the chain rule of probability,

$p(s) = \int p(s|x)p(x)dx$ so putting this all together

$$\langle x| s \rangle = \frac{\int x \cdot g(x;s)f(x)dx}{\int g(x;s)f(x)dx}$$

In your example, $f(x)$ is a uniform between $[0,1]$ and $g(x;s)=x^{s}(1-x)^{1-s}$, and thus

$$\langle x|s\rangle = \frac{\int _{0}^{1}x^{s+1}(1-x)^{1-s}}{\int _{0}^{1}x^{s}(1-x)^{1-s}}=\frac{\beta(s+2, 2-s)}{\beta(s+1, 2-s)}$$

(where $\beta$ denotes the beta function)

this simplifies to $\frac{s+1}{3}.$

Note that this gives you the same values you got, but I think you're somehow confusing x and s. s can take the values 0 or 1. If the coin flip comes up 0, chances are its probability of coming up heads (the result of your prior draw from the uniform) was lower than 0, and indeed the expectation is 1/3 and conversely it's 2/3 if the coin flip comes up 1. Note that $\langle x|s \rangle$ is not a function of x, as it's an expectation value of the random variable x, all x-dependence has been integrated out. It explicitly depends on s, and coincidentally in this example, there are two possible values of s and if you sum the values for $s=0$ and $s=1$ you get 1, but this is a coincidence. This is not a distribution over s, it isn't (generally speaking, albeit in this case it is) normalised wrt s.

So I think you're broadly on the right track but just need to make sure you're not confusing x and s. One thing that I don't think helped is that you've used this notation $g(x;s)$ to denote $p(s|x)$, which really I think you should be denoting as $g(s;x)$, or ideally $p(s|x)$. The notation $g(s;x)$ to me implies a pdf/pmf over s, with parameters x (and thus normalised wrt s). In your simple example, this made sense as $x$ was literally the parameter value, but more generally it might be the case that $s$ is distributed according to parameters $\theta$, and $\theta$ is actually a function of x, in which case the distribution of s still depends on x but isn't directly parametrised through it, hence me saying using $p(s|x)$ is probably more helpful notation in terms of keeping track of what's going on.

edited Jun 05 '23 at 15:05

User1865345

8,202

answered Jun 05 '23 at 13:29

gazza89

2,412
1
15
18

agree on g(s|x), it was a typo in my question (note I typed correctly in the displayed equation for q). Just to make sure, with < > notation you mean distribution? – andrea m. Jun 05 '23 at 14:36
1

No, my understanding is that $E(x|s)$ is an expectation value over x, and thus not a distribution over x, all x-dependence should be integrated out. So for any given value of s, it's a scalar...but while it has explicit s-dependence as I showed above, but it's not a distribution in s (and actually I made a mistake here in my original post which I will edit now) – gazza89 Jun 05 '23 at 14:59
Ok I agree with everything you say but doesn't your expression for $<x|s>$, with support $s={0,1}$ induce a probability distribution over "values in the support of $x$". This distribution has support ${1/3, 2/3}$. I wanted an expression for the probability distribution over such values, the one I call $q(x)$ – andrea m. Jun 05 '23 at 15:03
1

@gazza89, you can use the display mode for your equations: $$ $$ for better view. – User1865345 Jun 05 '23 at 15:03
As per my edit, it doesn't induce a distribution, it's just a function over the permitted values of s. it's not a distribution in the sense that it's not normalised wrt s. Although unfortunately, in the example you chose, it is "normalised" in that if you sum $<x|s>$ over the support in s, you get one, but I don't think this would be true if you'd chosen a non-uniform prior, for example.
And the support isn't ${1/3, 2/3}$, those are the two possible values the function can take, due to the support of s being ${0,1}$
– gazza89 Jun 05 '23 at 15:07
This is interesting. In my mind I have a well-defined data generating process. The data it generates are the conditional expectations E(x|s). Can't I not compute the frequencies of such values and call that a distribution? Even analytically, I don't see how it can not sum up to 1. – andrea m. Jun 05 '23 at 15:12
I was under the impression that your data generating process was the other way around, that first you sample x and then you sample s from x. But that you don't have access to x, it's unobservable, so you can only observe s and then you wanted to build up a posterior about your belief of the true value of x, given the value of s you have observed. And that posterior value then has an expected value, which is what I've calculated. But a very similar calculation would lead you to the posterior distribution $p(x|s)$ – gazza89 Jun 05 '23 at 15:22
Thanks for bearing with me! You are correct that I first sample x, then s from x. I don't have access to x, and I can compute a posterior E(x|s). My question is: given the processes that generate x and s, how frequently will I observe E(x|s)= z? I want the probability distribution over z.
Empirically, I could draw many x from $f$, then many $s$ from $g$ (this is behind my idea of integrating over $s$), and then observe the frequencies of the resulting $E(x|s)$. What's an acceptable analytic expression for such an operation?
– andrea m. Jun 05 '23 at 15:56
I'm thinking the answer boils down to computing $h(s) = \int g(s|x)f(x)d(x)$, and then $q(E(x|s))=h(s)$. The one doubt I have is how to handle cases where $(|)$ takes the same value for two different values of $$. How about the following: $$ q(z) = \int \int 1(E(x|s)=z)dg(s|x)ds f(x)dx $$ which I believe simplifies to $$ q(z) = \int 1(E(x|s)=z)p(s)ds $$ with $p(s) = \int g(x|s)f(x) dx$ – andrea m. Jun 05 '23 at 16:41
Take a step back and think about what a posterior is, you're ultimately saying there is a parameter, whose true value is not observable, so you want to get some distribution which reflects your beliefs about its probable values. Here that hidden variable is x, so you can get a posterior on x (that distribution is a beta distribution in this case btw), so this curve denotes how probable you feel various values of this hidden variable are. Your posterior is a function of s (but not a distribution in s). E(x|s) is the mean value of that posterior, it's no longer a distribution, it's a number – gazza89 Jun 05 '23 at 17:55
1

I finally understand. I agree E(X|s) is a number, a function of s, but how frequently would it be observed given the distribution of s? It will be observed with probability p(s) the denominator of Baye's rule, thanks for putting me on the right track. I answered my own question below. – andrea m. Jun 05 '23 at 20:50

andrea m. · Accepted Answer · 2023-06-05T21:53:13.230

1

I am answering my own question, thanking @gazza89 for putting me on the right track. Given $()=\int(|)() $, and $E(x|s)$ defined by Bayes' rule, the distribution of the conditional expectations $q(E(x|s))$ can be computed as follows.

If $E(x|s)$ is an invertible function of s, calling $E^{-1}(z)$ its inverse, then q(z) is obtained by a change of variables, for the discrete $s$ case $q(z) = p(E^{-1}(z))$, and for the continuous case $q(z) = p(E^{-1}(z))\left | \frac{dE^{-1}(z)}{dz}\right|$

If $E(x|s)$ is not invertible $$ q(z) = \int _{E(x|s)=z}p(s)ds $$

where $$ is the indicator function i.e. we need to sum up/integrate $p(s)$ for all values of s such that $E(x|s)=z$.

edited Jun 05 '23 at 21:53

answered Jun 05 '23 at 21:01

andrea m.

275

1

You need to use the absolute value of the derivative. $$q(z) = p(E^{-1}(z))\left|\frac{dE^{-1}(z)}{dz}\right|$$ – Sextus Empiricus Jun 05 '23 at 21:10
"sum up $p(s)$ for all values of s such that $E(x|s)=z$" if you have multiple values of $s$ then $E(x|s)$ is not invertible and your expression should need to be a sum of the multiple values of a multivalued function. – Sextus Empiricus Jun 05 '23 at 21:12
@SextusEmpiricus thanks I corrected the absolute value. "sum up the multiple values"... that is what I think the integral over the indicator function is doing – andrea m. Jun 05 '23 at 21:53
the integral is not summing up much $$\int \mathbb{I}_{x=0} \text{d}x = 0$$ – Sextus Empiricus Jun 06 '23 at 05:02
it's summing up all $p(s)$ such that $E(x|s)=z$ – andrea m. Jun 06 '23 at 08:30

Distribution of a conditional expectation

3 Answers3