8

Suppose $X \mid Y$ and $Y$ are normally distributed. Does it follow that $Y \mid X$ is normally distributed?

And if so, how would one prove this?

4 Answers4

17

Here, as a counterexample, is a sample of points where $Y$ has a standard Normal distribution and the conditional distribution of $X$ is always Normal, too.

enter image description here

This is obviously non-Normal.

To guarantee Normality, you need (almost surely) that

(1) $E[X\mid Y]$ must be a linear function of $Y$ and

(2) $\operatorname{Var}(X\mid Y)$ must be constant.

These are both characteristics of any Bivariate Normal distribution, so they are necessary conditions. When you write down the joint distribution implied by both these conditions, it will be Gaussian: that is, Bivariate Normal.

These R commands generated the example. The conditional variance $Y^4$ is not constant.

n <- 1e3
y <- rnorm(n)
x <- rnorm(n, y, y^2)
plot(x,y, col = "#00000040") # Semi-transparent points

whuber
  • 322,774
  • Thank you for this wonderfully thorough response. But can you elaborate on (or provide a reference to) the conditions (1) and (2) you mention, which if I understand you correctly, together imply that $(X,Y)$ is bivariate normal?

    Apologies if this is a trivial question; I'm new to this level of math and not very good.

    – user1141170 Jan 18 '23 at 23:16
  • 1
    These necessary conditions are described in most textbooks that cover ordinary least squares regression. It's explicit (and given very clearly, without formulas!) in any edition of Freedman, Pisani, & Purves, Statistics. That they suffice is demonstrated as I describe: simply write down the density. – whuber Jan 19 '23 at 00:48
7

whuber shows, by means of a counterexample, that the product of a Gaussian $X|Y$ times a Gaussian r.v. $Y$ does not necessarily lead to a joint Gaussian distribution, which in this case certainly doesn't have a conditional Gaussian.

Here is a (numerical) counter-example for your claim. Let $Y|X=x \sim N(x^3, 1)$ and $X \sim N(0,1)$. Then $(X,Y)$ is not jointly normal. The marginal of $Y$ is

$$ f_Y(y) = \frac{1}{2\pi}\int_{-\infty}^{\infty} e^{-\frac{1}{2}(y - x^3)^2 - \frac{1}{2}x^2}dx. $$

Unfortunately this integral (as far as I know) cannot be computed analytically, but we can provide a fairly good approximation via adaptive quadratures. Thus the conditional we require is

$$ f_{X|Y} = \frac{e^{-\frac{1}{2}(y - x^3)^2 - \frac{1}{2}x^2}}{\int_{-\infty}^{\infty} e^{-\frac{1}{2}(y - x^3)^2 - \frac{1}{2}x^2}dx}. $$

The result is disproved provided we are able to show that this conditional is not Gaussian. Toward this aim, let's fix $y = -1$. An elegant proof may be given by studying the properties of this function but here I am taking a brute-force approach in which I approximate this beast numerically. If the conditional is gaussian we expect it to be unimodal and symmetric. This is the contradiction I'll be looking for.

 # propto the joint distribution
f_yx <- function(y, x) {
  exp(-0.5*(y-x^3)^2 - 0.5*x^2)
}

the marginal of Y

f_y <- function(y) { integrate(function(t) f_yx(y, t), lower = -Inf, Inf)$value }

the conditional of x given y

x_given_y = function(x, y) f_yx(y,x)/f_y(y)

fix y

y = -1

x <- seq(-3, 3, len=100) cond_y <- sapply(x, x_given_y, y=y) plot(x, cond_y, type ="l", lwd=2)

enter image description here

If we can trust numerical integration (I'm using the integrate function here which is extremely robust!), we can see from the plot that this conditional density is definitely non-Gaussian. This contradicts the claim.

Side comment. There exist non-Gaussian bivariate distributions which have Gaussian conditional densities. One example of this is

$$ f(x,y) = C\exp\left(-(1+x^2)(1+y^2)\right),\quad -\infty <x,y<\infty,$$

where $C$ is the normalising constant. You can check that both $Y|X=x$ and $X|Y=x$ are Gaussian with suitable parameters.

utobi
  • 11,726
  • What do you mean by "this joint Gaussian certainly doesn't have a conditional Gaussian" -- it is well known that if there is joint Gaussian, then all conditionals are Gaussian. By the way, the example you gave seems cannot negate OP's conjecture -- as he is questioning the conditional normality, not the joint normality? – Zhanxiong Jan 18 '23 at 23:09
  • 1
    Thanks, your updated example seems working. However, as my commented before, how can $X|Y$ non-Gaussian if you stated $(X, Y)$ is jointly Gaussian? (this may be just another typo...) – Zhanxiong Jan 19 '23 at 00:53
  • Yep, Corrected +1. – utobi Jan 19 '23 at 06:16
  • 1
    Maple gives me `int( exp( -(1/2)(y-x^3)^2 - (1/2)x^2 ), y=-infinity..infinity ) assuming x,real; / 1 2\ (1/2) (1/2) exp|- - x | 2 Pi
    \ 2 /

    ` por withn latex ${\mathrm e}^{-\frac{x^{2}}{2}} \sqrt{2}, \sqrt{\pi}$

    – kjetil b halvorsen Jan 22 '23 at 00:36
  • @kjetilbhalvorsen but we need the marginal of y here. thus int should be for ‘x=-inf..inf’ right? Many maple can do this as well? I checked with Wolfram online without success : ‘computing time exceeded std. time’ – utobi Jan 22 '23 at 05:46
  • @utobi: Maple cannot do that, ... – kjetil b halvorsen Jan 22 '23 at 14:56
  • Thanks for the feedback @kjetilbhalvorsen. – utobi Jan 22 '23 at 16:18
7

Here I will augment the excellent answer by whuber by showing the mathematical form of your general model and the sufficient conditions that imply a normal distribution for $Y|X$. Consider the general hierarchical model form:

$$\begin{align} X|Y=y &\sim \text{N}(\mu(y),\sigma^2(y)), \\[6pt] Y &\sim \text{N}(\mu_*,\sigma^2_*). \\[6pt] \end{align}$$

This model gives the joint density kernel:

$$\begin{align} f_{X,Y}(x,y) &= f_{X|Y}(x|y) f_{Y}(y) \\[12pt] &\propto \frac{1}{\sigma(y)} \cdot \exp \Bigg( -\frac{1}{2} \Big( \frac{x-\mu(y)}{\sigma(y)} \Big)^2 \Bigg) \exp \Bigg( -\frac{1}{2} \Big( \frac{y-\mu_*}{\sigma_*} \Big)^2 \Bigg) \\[6pt] &= \frac{1}{\sigma(y)} \cdot \exp \Bigg( -\frac{1}{2} \Bigg[ \Big( \frac{x-\mu(y)}{\sigma(y)} \Big)^2 + \Big( \frac{y-\mu_*}{\sigma_*} \Big)^2 \Bigg] \Bigg) \\[6pt] &= \frac{1}{\sigma(y)} \cdot \exp \Bigg( -\frac{1}{2} \Bigg[ \frac{(x-\mu(y))^2 \sigma_*^2 + (y-\mu_*)^2 \sigma(y)^2}{\sigma(y)^2 \sigma_*^2} \Bigg] \Bigg) \\[6pt] &\overset{y}{\propto} \frac{1}{\sigma(y)} \cdot \exp \Bigg( -\frac{1}{2} \Bigg[ \frac{(\mu(y)^2 - 2x \mu(y)) \sigma_*^2 + (y^2-2y\mu_* + \mu_*^2) \sigma(y)^2}{\sigma(y)^2 \sigma_*^2} \Bigg] \Bigg), \\[6pt] \end{align}$$

which gives the conditional density kernel:

$$\begin{align} f_{Y|X}(y|x) &\overset{y}{\propto} \frac{1}{\sigma(y)} \cdot \exp \Bigg( -\frac{1}{2} \Bigg[ \frac{(\mu(y)^2 - 2x \mu(y)) \sigma_*^2 + (y^2-2y\mu_* + \mu_*^2) \sigma(y)^2}{\sigma(y)^2 \sigma_*^2} \Bigg] \Bigg). \\[6pt] \end{align}$$

In general, this is not the form of a normal density. However, suppose we impose the following conditions on the condtional mean and variance of $X|Y$:

$$\mu(y) = a + by \quad \quad \quad \quad \quad \sigma^2(y) = \sigma^2.$$

These conditions mean that we require $\mu(y) \equiv \mathbb{E}(X|Y=y)$ to be an affine function of $y$ and we require $\sigma^2(y) \equiv \mathbb{V}(X|Y=y)$ to be a fixed value. Incorporating these conditions gives:

$$\begin{align} f_{Y|X}(y|x) &\overset{y}{\propto} \frac{1}{\sigma(y)} \cdot \exp \Bigg( -\frac{1}{2} \Bigg[ \frac{(\mu(y)^2 - 2x \mu(y)) \sigma_*^2 + (y^2-2y\mu_* + \mu_*^2) \sigma(y)^2}{\sigma(y)^2 \sigma_*^2} \Bigg] \Bigg) \\[6pt] &= \frac{1}{\sigma} \cdot \exp \Bigg( -\frac{1}{2} \Bigg[ \frac{((a + by)^2 - 2x (a + by)) \sigma_*^2 + (y^2-2y\mu_* + \mu_*^2) \sigma^2}{\sigma^2 \sigma_*^2} \Bigg] \Bigg) \\[6pt] &= \frac{1}{\sigma} \cdot \exp \Bigg( -\frac{1}{2} \Bigg[ \frac{(b^2 y^2 + 2ab y + a^2 b^2 - 2xa - 2xb y) \sigma_*^2 + (y^2-2y\mu_* + \mu_*^2) \sigma^2}{\sigma^2 \sigma_*^2} \Bigg] \Bigg) \\[6pt] &\overset{y}{\propto} \cdot \exp \Bigg( -\frac{1}{2} \Bigg[ \frac{(\sigma^2 + b^2 \sigma_*^2 ) y^2 + 2(b(a - x) \sigma_*^2 - \mu_* \sigma^2) y}{\sigma^2 \sigma_*^2} \Bigg] \Bigg) \\[6pt] &\overset{y}{\propto} \cdot \exp \Bigg( -\frac{1}{2} \Bigg[ \frac{y^2 + 2[(b(a - x) \sigma_*^2 - \mu_* \sigma^2)/(\sigma^2 + b^2 \sigma_*^2) ] y}{\sigma^2 \sigma_*^2/(\sigma^2 + b^2 \sigma_*^2 ) } \Bigg] \Bigg) \\[6pt] &\overset{y}{\propto} \cdot \exp \Bigg( -\frac{1}{2} \Bigg[ \frac{1}{\sigma^2 \sigma_*^2/(\sigma^2 + b^2 \sigma_*^2 )} \cdot \Big( y - \frac{b(a - x) \sigma_*^2 - \mu_* \sigma^2}{\sigma^2 + b^2 \sigma_*^2} \Big)^2 \Bigg] \Bigg) \\[6pt] &\overset{y}{\propto} \text{N} \Bigg( y \Bigg| \frac{b(a - x) \sigma_*^2 - \mu_* \sigma^2}{\sigma^2 + b^2 \sigma_*^2}, \frac{\sigma^2 \sigma_*^2}{\sigma^2 + b^2 \sigma_*^2} \Bigg). \\[6pt] \end{align}$$

Here we see that we have a normal distribution for $Y|X$ which confirms that the above conditions on the conditional mean and variance of $X|Y$ are sufficient to give this property.

Ben
  • 124,856
  • +1. But I can't help thinking there's a surfeit of algebra here. Wouldn't it suffice to say that once $\sigma(y)$ is constant and because the argument of $\exp$ is (obviously) a quadratic function of $y,$ that your demonstration is complete? In fact, if we were to impose "the following conditions" at the outset, the first two lines of this answer would suffice, exposing the conclusion as the result of the laws of exponentiation and polynomial addition. – whuber Jan 22 '23 at 17:54
  • 1
    @whuber: You're right that that would suffice to show normality, and perhaps that would be a better approach. In this answer I've decided to go that step further to complete the square to get the relevant conditional moments of the distribution, so that I've got a full specification of the distribution of interest (not just knowing it's normal). I agree that that goes beyond the question, but I just thought it might be usefu. – Ben Jan 22 '23 at 21:37
  • That result is appreciated. I can't help thinking that stating this slightly more abstractly and generally would reveal the basic idea better and eliminate about two-thirds of the work. Specifically, for a sequence of triples $(\sigma_i,\mu_i,\alpha_i)$ (with $\sigma_i\gt 0$) it is elementary (and immediate by inspection) that $$\sum_i \frac{1}{2}\left(\frac{x-\mu_i}{\sigma_i}\right)^2+\alpha_i=\frac{1}{2}\left(\frac{x-M}{\Sigma}\right)^2+A$$ where $$\frac{1}{\Sigma^2}=\sum_i\frac{1}{\sigma_i^2}$$ and $$M=\sum_I\frac{\mu_i}{\sigma_i^2}.$$ – whuber Jan 24 '23 at 15:04
  • BTW, didn't you already post exactly this result at https://stats.stackexchange.com/a/372075/919, using very nearly the same notation? You could just quote it. – whuber Jan 24 '23 at 17:34
  • @whuber: I forgot about that one! Déjà vu. – Ben Jan 25 '23 at 06:23
  • I know the problem ;-). – whuber Jan 25 '23 at 12:04
2

Here is another counterexample which gives closed-form distributions of $X, Y, X|Y = y$ and $Y|X = x$.

Let $Y, Z \text{ i.i.d. } \sim N(0, 1)$, and define $X = \frac{Z}{Y}$. Then for $y \neq 0$ (the probability of $Y = 0$ is zero), \begin{align} X | Y = y \sim N(0, y^{-2}). \end{align}

On the other hand, it is well known that the marginal distribution of $X$ is Cauchy distribution, i.e., \begin{align} f_X(x) = \frac{1}{\pi(1 + x^2)}, \quad x \in \mathbb{R}. \tag{1} \end{align} And the joint distribution of $X$ and $Y$ can be evaluated as (where $\Phi$ and $\varphi$ denote CDF and PDF of the standard normal distribution respectively): \begin{align} & F(x, y) = P[X \leq x, Y \leq y] \\ =& P[Z \leq Yx, Y \leq y, Y > 0] + P[Z \geq Yx, Y \leq y, Y < 0] \\ =& \begin{cases} \int_{-\infty}^y(1 - \Phi(tx))\varphi(t)dt & y < 0, \\[1em] \int_0^y\Phi(tx)\varphi(t)dt & y > 0. \end{cases} \end{align} Therefore, the joint density of $(X, Y)$ is given by \begin{align} & f(x, y) = \frac{\partial^2F(x, y)}{\partial x\partial y} \\ =&\begin{cases} -y\varphi(y)\varphi(yx) & y < 0, \\[1em] y\varphi(y)\varphi(yx) & y > 0 \end{cases} \\ =& \frac{1}{2\pi}|y|e^{-(1 + x^2)y^2/2}. \tag{2} \end{align}

$(1)$ and $(2)$ together yield the conditional density of $Y$ given $X = x$:
\begin{align} f_{Y|X}(y|X = x) = \frac{f(x, y)}{f_X(x)} = \frac{1}{2}|y|(1 + x^2)e^{-(1 + x^2)y^2/2}. \tag{3} \end{align}

Obviously, $(3)$ is not the density of any normal distribution (with $y$ as the variate). Thus $Y | X = x$ is not normal. For example, when $x = 0$, $(3)$ looks like as follows:

enter image description here

P.S., PDF $(3)$ may be termed as "double generalized gamma distribution", based on these two articles: Generalized gamma distribution and Double Gamma Distribution. The parameters linked to the generalized gamma distribution are $a = \sqrt{2(1 + x^2)^{-1}}$ (scale) and $d = 2, p = 2$.

Zhanxiong
  • 18,524
  • 1
  • 40
  • 73