4

Suppose $x,y$ are IID samples from a Gaussian distribution in $\mathbb{R}^d$. The following seems true:

$$2\ \mathbb{E}\left[\langle x, y\rangle^2\right] = \mathbb{E}\left[\|x\|^4\right]-\mathbb{E}\left[\|x\|^2\right]^2+2\ \|\mathbb{E}x\|^4$$

Is this a well known identity? How do I prove this?

notebook

Yaroslav Bulatov
  • 6,199
  • 2
  • 28
  • 42
  • By virtue of the result at https://stats.stackexchange.com/a/85977/919, when $(x,y)$ is standard Normal, $\langle x,y\rangle$ has the same distribution as the product of three independent variables $U,$ $V,$ and $B,$ where $U$ and $V$ have chi$(d)$ distributions and $(B+1)/2$ has a Beta$((d-1)/2,(d-1)/2)$ distribution. From that it's straightforward to compute the expectation of the square -- or any other moments. You can at least use this to check your conjecture. – whuber Dec 07 '22 at 04:00
  • BTW, the polarization identity $2\langle x,y\rangle=(||x+y||^2-||x-y||^2)/2$ simplifies the question substantially. – whuber Dec 07 '22 at 04:04
  • @whuber $x,y$ are not standard normal however, so not sure how to apply that result here. I checked the conjecture using integration in Mathematica, but that's not very enlightening – Yaroslav Bulatov Dec 07 '22 at 18:43
  • You can always write $x$ as a linear combination of a standard multivariate Normal, which tells you what its moments are. That's all you need, because your identity equates two variances and the arguments of those variances transform in the same way under linear combinations. That reduces the question to the standard Normal case. This appears to recast the polarization identity as a relationship between variances of squared quantities, which is why polarization comes to mind. – whuber Dec 07 '22 at 19:34

2 Answers2

2

Write $x = (x_i)$ and $y = (y_i)$ so that $\langle x,y\rangle = \sum x_iy_i.$ Further, for the moments write $E[x_ix_j]=\sigma_{ij}.$ The Normality assumption implies

$$E[x_i^4] = 3\sigma_{ii}^2\ \text{ and }\ E[x_i^2x_j^2] = \sigma_{ii}\sigma_{jj} + 2\sigma_{ij}^2.$$

(The right hand formula reduces to the left hand one when $i=j$ anyway.)

Assume temporarily that the mean is zero. Because $x,y$ are independent and expectation is linear, we have

$$\begin{aligned} E\left[\right(\sum x_iy_i\left)^2\right] &= E\left[\sum x_iy_i\sum x_jy_j\right] = \sum_{i,j} E\left[x_iy_ix_jy_j\right]\\&= \sum_{i,j} E\left[x_ix_j\right]E\left [y_iy_j\right] \\ &= \sum_{i,j}\sigma_{ij}^2. \end{aligned}\tag{*}$$

On the other hand,

$$\begin{aligned} E\left[||x||^4\right] &= E\left[\left(\sum x_i^2\right)^2\right] = E\left[\sum x_i^2\sum x_j^2\right] = \sum_{i,j} E\left[x_i^2x_j^2\right] = \sum_{i,j} \sigma_{ii}\sigma_{jj} + 2\sigma_{ij}^2 \end{aligned}$$

and

$$\begin{aligned} E\left[||x||^2\right]^2 &= E\left[\sum x_i^2\right]^2 = \sum_{i,j} \sigma_{ii}\sigma_{jj}. \end{aligned}$$

Subtracting this from the previous result leaves twice $(*),$ demonstrating the equality for zero-mean variables.

The reason why we needn't consider the case where the mean $\mu=(\mu_i)$ is nonzero is because

$$E\left[\sum x_iy_i\right] = \sum E[x_i] E[y_i] = \sum \mu_i^2 = ||\mu||^2$$

shows

$$E\left[\right(\sum x_iy_i\left)^2\right] - ||E[x]||^4 =E\left[\right(\sum x_iy_i\left)^2\right] - E\left[\sum x_iy_i\right]^2 = \operatorname{Var}\left(\sum x_iy_i\right)$$

and, clearly, $E\left[||x||^4\right] - E\left[||x||^2\right]^2 = \operatorname{Var}(||x||^2).$ Thus, your identity equates two variances and, since variances are invariant under change of location, the mean does not affect them, QED.

whuber
  • 322,774
  • The last statement "variances are invariant under change of location" seems to be a little loose? Clearly $\operatorname{Var}(|X + \mu|^2) \neq \operatorname{Var}(|X|^2)$ when $\mu \neq 0$. – Zhanxiong Dec 08 '22 at 19:03
  • @Zhanxiong You misapply the result, because the map $||X||^2\to||X+\mu||^2$ is not a change of location. There's nothing at all loose about my claim: it's a fundamental property of variances that for any random variable $Y$ and finite constant $\mu,$ $\operatorname{Var}(Y+\mu)=\operatorname{Var}(Y).$ – whuber Dec 09 '22 at 14:53
  • But here we are dealing with the variance of squared norm of $Y$, not the variance of $Y$ itself, right? – Zhanxiong Dec 09 '22 at 15:03
  • @Zhanxiong You're right -- there's a gap in the argument. Thank you for pointing that out. – whuber Dec 09 '22 at 15:13
2

To evaluate $E[(X'Y)^2]$ (for clarity, let me use $X, Y$ to denote the two i.i.d. $N_d(\mu, \Sigma)$ random vectors), use the following result (Probability and Measure, Exercise 21.13):

Suppose that $X$ and $Y$ are independent and that $f(x, y)$ is non-negative. Put $g(x) = E[f(x, Y)]$, then $E[g(X)] = E[f(X, Y)]$.

In this case, \begin{align} g(x) = E[(x'Y)^2] = \operatorname{Var}(x'Y) + [E(x'Y)]^2 = x'\Sigma x + (x'\mu)^2. \end{align} It then follows by the above result that (where we used the expectation of quadratic forms) \begin{align} & E[(X'Y)^2] = E[X'\Sigma X] + E[(X'\mu)^2] = E[X'\Sigma X] + \operatorname{Var}(X'\mu) + [E(X'\mu)]^2 \\ =& \mu'\Sigma\mu + \operatorname{tr}(\Sigma^2) + \mu'\Sigma\mu + \|\mu\|^4 \\ =& 2\mu'\Sigma\mu + \operatorname{tr}(\Sigma^2) + \|\mu\|^4. \end{align}

Therefore, to show your equality, it suffices to show that \begin{align} \operatorname{Var}(X'X) = 4\mu'\Sigma\mu + 2\operatorname{tr}(\Sigma^2), \tag{1} \end{align} which is immediate in view of the variance of Gaussian quadratic forms. For a proof to $(1)$, see this answer.


Since the link above didn't treat the non-central case in detail, here is a complete proof to $(1)$.

We first show when $Z \sim N_d(0, \Sigma)$, it holds that \begin{align} \operatorname{Var}(Z'Z) = 2\operatorname{tr}(\Sigma^2). \tag{2} \end{align}

To this end, by Gaussian assumption, \begin{align} \operatorname{Cov}(Z_i^2, Z_j^2) = \begin{cases} 2\sigma_{ij}^4 & i = j, \\ 2\sigma_{ij}^2 & i \neq j. \end{cases} \tag{3} \end{align}

In view of $(3)$, if denote the covariance matrix of $\xi := (Z_1^2, \ldots, Z_d^2)'$ by $\tilde{\Sigma} = (\operatorname{Cov}(Z_i^2, Z_j^2))$, then $\tilde{\Sigma} = 2\Sigma \circ \Sigma$, where "$\circ$" stands for the Hadamard product of matrices. It then follows that (let $e$ be the length-$d$ vector of all ones):
\begin{align} \operatorname{Var}(Z'Z) = \operatorname{Var}(e'\xi) = e'\tilde{\Sigma}e = 2e'(\Sigma \circ \Sigma)e \color{red}{=} 2\operatorname{tr}(I_{(d)}\Sigma I_{(d)}\Sigma) = 2\operatorname{tr}(\Sigma^2), \end{align} where the red equality uses the third identity of Hadamard product properties. Therefore, $(2)$ holds.

For the general $X \sim N_d(\mu, \Sigma)$, write $X = \mu + Z$, where $Z \sim N_d(0, \Sigma)$. Then by $(2)$:
\begin{align} & \operatorname{Var}(X'X) = \operatorname{Var}(Z'Z + 2\mu'Z) \\ =& \operatorname{Var}(Z'Z) + 4\operatorname{Var}(\mu'Z) + 4\operatorname{Cov}(Z'Z, \mu'Z) \\ =& 2\operatorname{tr}(\Sigma^2) + 4\mu'\Sigma\mu + 4E[Z'Z\mu'Z]. \tag{4} \end{align}

Comparing $(1)$ and $(4)$, it is easy to see that $(1)$ holds if we can show $E[Z'Z\mu'Z] = 0$. Expanding $Z'Z\mu'Z$ yields \begin{align} & E[Z'Z\mu'Z] = \sum_{i = 1}^d \mu_iE[Z_i^3] + \sum_{i = 1}^d\sum_{j \neq i} \mu_i E[Z_iZ_j^2]. \end{align}

Since $Z_i \sim N(0, \sigma_{ii}^2)$, $E[Z_i^3] = 0$. In addition, when $j \neq i$, since \begin{align} \begin{bmatrix} Z_i \\ Z_j \end{bmatrix} \sim N_2\left(\begin{bmatrix} 0 \\ 0 \end{bmatrix}, \begin{bmatrix} \sigma_{ii}^2 & \sigma_{ij} \\ \sigma_{ij} & \sigma_{jj}^2 \end{bmatrix} \right), \end{align} we have (using the conditional distribution of MVN) \begin{align} E[Z_iZ_j^2] = E[E[Z_iZ_j^2|Z_j]] = E[Z_j^2E[Z_i|Z_j]] = \sigma_{ij}\sigma_{jj}^{-2}E[Z_j^3] = 0. \end{align}

This shows that every term in $E[Z'Z\mu'Z]$ is $0$, whence $E[Z'Z\mu'Z] = 0$. This completes the proof.

Zhanxiong
  • 18,524
  • 1
  • 40
  • 73