4

If $\tilde{u}$ is a normally distributed random variable with mean $q$ and variance $\sigma$, and $s=\tilde{u}+\tilde{e}$ where $e$ is also normally distributed with mean 0 and variance $\sigma_s$. $\tilde{u}$ and $\tilde{e}$ are independent of each other. Why is it that $E[\tilde{u}|s\leq a]=q-\sigma\frac{f(a)}{F(a)}$, where $f(a)$ is the PDF of and $F(a)$ is the CDF of $s$ at $a$. This related question is the closest explanation I could find but the difference is that in the post $a$ is standardized.

  • Welcome to cv, r-learning-machine! Could you double check a few things: 1) what is $\tilde u$, 2) is $e$ independent of $u$ 3) is $F$ the cdf of s, or $G$? 4) what is the variance if $e$. – Ute Aug 20 '23 at 19:54
  • Thank you for the comment. I have updated the post now so that these questions should be clear. – r-learning-machine Aug 20 '23 at 21:49
  • I have calculated a bit and think it should be $E[\tilde{u}|s\leq a]=q-\sigma^2\frac{f(a)}{F(a)}$ (replace $\sigma$ by $\sigma^2$). I might post an answer with more usual (general) symbols $X$ and $Y$ for the random variables, just because I make less typos than when using $\tilde u$ and $s$. – Ute Aug 21 '23 at 14:46
  • That would be very helpful. If you do that, I can edit the post to update the notation so that it matches. – r-learning-machine Aug 21 '23 at 14:49
  • 1
    Because $(\tilde u, s)$ has a Bivariate Normal distribution, you can find the answer worked out in detail at https://stats.stackexchange.com/questions/166273. It is not just in terms of a "standardized" $a:$ it is fully general. – whuber Aug 21 '23 at 17:44
  • @whuber, that referenced question deals with the univariate normal - is there a clever shortcut to general multivariate normals? – Ute Aug 21 '23 at 18:57
  • @Ute Good point: there's a missing step. After regressing the second variable on the first (which is answered in a few hundred posts here on CV ;-)), the two questions become the same. – whuber Aug 21 '23 at 19:32
  • Well, @whuber, I thought about taking a shortcut first, but realized that $\sigma^2$ in the original post is not the variance of $s$, but of $\tilde u$. So now there is answer $100x+1$ - without the right key words at hand, the $100x$ are not trivial to find ;-) – Ute Aug 21 '23 at 20:05
  • @Ute I don't follow that, especially not the references to "100." However, it seems to me all you need is to compute the slope of the regression of $\tilde u$ against $s,$ and that's simply a matter of computing the covariances. – whuber Aug 21 '23 at 21:55
  • 1
    @whuber, would be interesting to see that if you have time. Sorry for the confusing 100x, a pun referring to "few hundred posts". – Ute Aug 21 '23 at 22:32

2 Answers2

2

Conditional mean of normal variable, given the convolution

To answer the question in a bit more general terms, I use different notation here. $\newcommand{\E}{\mathbb{E}}\newcommand{\1}{\mathbb{1}}$ Let $X\sim N(\mu,\sigma^2_X$), $E\sim N(0,\sigma^2_E$) be independent. We observe the convolution $Y=X+E$ and are interested in the conditional mean $\E (X\mid Y\leq a)$.

Let $f_X, f_Y, f_E$ denote the pdfs of $X, Y, E$, and $F_X, F_Y, F_E$ the corresponding cdfs. The joint distribution of $X$ and $Y$ has pdf $$f_{X,Y}(x,y) = f_X(x)f_E(y-x),$$ and we can express the required conditional mean as $$ \begin{aligned} \E(X\mid Y\leq a)&=\frac{\int\int x f_{X,Y}(x,y)\1(y\leq a)dy\, dx}{\int\int f_{X,Y}(x,y)\1(y\leq a)dy\, dx} \\&= \frac{1}{F_Y(a)}\int\int x f_X(x)f_E(y-x)\1(y-x\leq a-x) dy\, dx \\&= \frac{1}{F_Y(a)}\int\int x f_X(x)\1(e\leq a-x) f_E(e) de\, dx \\&= \frac{1}{F_Y(a)}\int\underbrace{\left[\int x f_X(x)\1(x\leq a-e) dx\right]}_{\E(X\mid X\leq a-e)\cdot P(X\leq a-e)} f_E(e) de. \end{aligned} $$ Now we can apply the result in @whuber's answer from the thread linked in the question, replacing the standard normal pdf / density by $f_X(x)=\frac{1}{\sigma}\phi((x-\mu)/\sigma)$: $$ \E(X\mid X\leq a-e) = \mu-\sigma_X\frac{\sigma_X f_X(a-e)}{F_X(a-e)} = \mu-\sigma_X^2\frac{f_X(a-e)}{P(X\leq a-e)}. $$ This result uses the fact that $X$ is normal distributed. We get that $$ \begin{aligned} \E(X\mid Y\leq a)&= \frac{1}{F_Y(a)}\int{\E(X\mid X\leq a-e)\cdot P(X\leq a-e)} f_E(e) de \\&= \frac{1}{F_Y(a)}\int{\mu P(X\leq a-e) - \sigma_X^2 f_X(a-e)} f_E(e) de \\&= \frac{1}{F_Y(a)}\left[\mu \underbrace{\int P(X+e\leq a)f_E(e) de}_{P(Y \leq a)} - \sigma_X^2 \underbrace{\int f_X(a-e) f_E(e) de}_{f_Y(a)}\right] \\&= \mu - \sigma_X^2 \frac{f_Y(a)}{F_Y(a)}. \end{aligned} $$ This looks familiar, doesn't it? It is tempting to interpret this result as $\E (Y\mid Y\leq a)$. However, note that $Y\sim N(\mu, \sigma_X^2+\sigma_E^2)$, so $\sigma_Y^2= \sigma_X^2+\sigma_E^2$, and $$ \E (Y\mid Y\leq a) = \mu - \sigma_Y^2 \frac{f_Y(a)}{F_Y(a)} = \E(X\mid Y\leq a) -\sigma^2_E\frac{f_Y(a)}{F_Y(a)} . $$ The larger the noise variance $\sigma^2_E$, the bigger gets the discrepancy between $\E (Y\mid Y\leq a)$ and $\E (X\mid Y\leq a)$ - which makes sense: for $\sigma^2_E$, we have $Y=X$.

Ute
  • 2,580
  • 1
  • 8
  • 22
2

Given the univariate formula provided at Expected value of x in a normal distribution, GIVEN that it is below a certain value, you can solve this mentally by choosing appropriate units of measurement and understanding the basics of regression.

Write $\sigma^2=\operatorname{Var}(\tilde u).$ Temporarily employ a unit of measurement for $s$ in which $E[s]=0$ and $\operatorname{Var}(s)=1.$ That is, $s$ has a standard Normal distribution with density $\phi$ and distribution $\Phi.$

We know from the basic theory of least squares regression that the conditional expectation of $\tilde u$ is a linear function of $s,$ $$E[\tilde u \mid s] = \beta s,$$

where $\beta\operatorname{Var}(s) = \operatorname{Var}(\tilde u)$ (in the temporary units).

Consequently

$$E[\tilde u \mid s \le a] = E[E[\tilde u\mid s] \mid s \le a] = \beta E[s \mid s \le a].$$

The problem has been reduced from a bivariate one to a univariate one involving only $s.$ The formula in the referenced thread states

$$E[s \mid s \le a] = - \frac{\phi(a)}{\Phi(a)}.$$

Converting back to the original units changes the right hand side to $-f(a)/F(a)$ where $f$ and $F$ are the density and distribution of $s,$ but the units calculus requires it to be further multiplied by $\operatorname{Var}(s)$ (because the left hand side is in units of $s$ and the right hand side is reciprocal units: $f$ is a density and $F$ is unitless.) Thus the net multiplier is $\beta\operatorname{Var}(s) = \beta\operatorname{Var}(\tilde u) = \sigma^2$ and, finally, we can return to the original units of measurement by adding the origin $q$ back in to produce

$$E[\tilde u \mid s \le a] = q - \sigma^2 \frac{f(a)}{F(a)}.$$

whuber
  • 322,774
  • Thank you for your explanations @Ute and whuber. I am just trying to understand it fully. In my view, the conditional expectation should be $E[\tilde{u}|s]=q+\beta s$, which will then yield $E[\tilde{u} |s \leq a]= q+\beta E[s|s \leq a]$. I think that keeping the referenced post in mind and that $-\frac{f(a)}{F(a)}=-\sigma \frac{\phi(a)}{\Phi(a)}$ in order to get the final answer we need $\beta=\sigma$, but it is not. – r-learning-machine Aug 22 '23 at 17:53
  • It comes down to correctly relating $f(a)$ to $\phi(a).$ A common mistake is to forget the Jacobian. The use of the units calculus helps us remember to include it. For further explanation of using the units calculus in this situation, please see https://stats.stackexchange.com/a/624633/919. – whuber Aug 22 '23 at 18:30
  • This is eye opening. I had never heard of 'units calculus' until today but I think after reading your post, the idea is that once you revert the temporarily employed unit variance back to its original variance (i.e., var(s) no longer equal to one), it cancels out the jacobian portion (i.e., the var of s) of the beta coefficient, which results in $\sigma^2$. However, I am not sure that $\beta Var(\tilde{u})$ is also equal to $\sigma^2$ – r-learning-machine Aug 22 '23 at 20:56
  • The latter is a basic covariance calculation. It uses the usual formula for $\beta$ in least squares regression (as a covariance divided by a variance) and computes both the variance and covariance from your definitions of $\tilde u$ and $s.$ Specifically, $$\operatorname{Var}(s) = \operatorname{Var}(\tilde u + \tilde e) = \operatorname{Var}(\tilde u) + \operatorname{Var}(\tilde e)$$ and $$\operatorname{Cov}(\tilde u, s) = \operatorname{Cov}(\tilde u, \tilde u + \tilde e)=\operatorname{Var}(\tilde u) + \operatorname{Cov}(\tilde u, \tilde e) = \operatorname{Var}(\tilde u).$$ – whuber Aug 22 '23 at 22:02
  • That makes sense for $\beta Var(\tilde{s})=\sigma^2$ but was not sure how $\beta Var(\tilde{u})$ is also equal to $\sigma^2$ as written in your answer. – r-learning-machine Aug 22 '23 at 22:26
  • That's the definition. My "$\sigma^2$" is your $\sigma.$ (It is so unusual to use "$\sigma$" alone for a variance that I felt obliged to make that change, announced at the beginning of the post.) – whuber Aug 22 '23 at 22:51