6

Suppose $X$ is an $m\times n$ random matrix where rows are I.I.D. samples of $n$-dimensional Gaussian, and $A$ is an $n\times n$ matrix. I'm looking for the value of the following quantities: $$E[X^T X A X^T X]$$

For zero-centered Gaussian with covariance $\Sigma$ and $m=1$, the following holds

$$E[X^T X A X^T X]=\\ \Sigma A \Sigma + \Sigma A^T \Sigma + \Sigma \operatorname{Tr}(\Sigma A)$$

What is the expression for more general case?

Edit The following formula from the answer has been confirmed to work:

$$\begin{equation} \frac{1}{m^2}E[X^TXAX^TX] = \\\frac{(m-1)}{m}HAH + \frac{1}{m} \left( H(A+A^T)H - \mu\mu^T(A+A^T)\mu\mu^T + H\operatorname{Tr}(HA) \right), \end{equation}$$ where $H=\Sigma+\mu\mu^T$

Richard Hardy
  • 67,272
Yaroslav Bulatov
  • 6,199
  • 2
  • 28
  • 42

1 Answers1

5

I have never seen an closed form expression for this. Probably because it is quite ugly. I have worked with a similar expression before, and I'd be happy to see if my expression is stands up to yours. So here is one way to compute the matrix expectation.

I will work with $\frac{X^TX}{m}$ rather than $X^TX$, since that is the empirical 2nd order moment matrix. It is a quantity that will converge.

Lemma First some tricks that need index manipulations for me to see. If $y\sim \mathcal{N}(0,\Sigma)$, $\mu$ a vector and $A$ is a matrix, $$E[y\mu{}^T A y \mu^T] = \Sigma{}A^T\mu\mu^T$$ $$E[\mu{}y^TA\mu{}y^T] = \mu\mu^T A^T \Sigma $$

Proof: Follow the indices! I use Einstein summation convention to be able declutter notation. $$E[y\mu{}^T A y \mu]_{ij} = E[y_i\mu_kA_{kl}y_l\mu_j] = E[y_iy_l]A^T_{lk} \mu_k \mu_j= \Sigma_{il}A^T_{lk}\mu_k\mu_j = [\Sigma{}A^T\mu\mu^T]_{ij}$$ The second equation is similarly proven. QED

Next, we will look at the zero-mean case as a warm-up exercise.

Lemma Assume $m$ independent gaussian random vectors $x^{(i)} \overset{iid}{\sim} \mathcal{N}(0,\Sigma)$. Row-stack them in the matrix $X$. Then \begin{equation} \begin{split} E\left[\frac{X^TX}{m}A\frac{X^TX}{m}\right] &= \frac{m-1}{m} \Sigma{}A\Sigma{} \\ & + \frac{1}{m}\left( \Sigma{}(A+A^T)\Sigma{} + \operatorname{Tr}(\Sigma{}A)\Sigma{} \right) \end{split} \end{equation}

Proof:

$$ \frac{1}{m^2}E[X^TXAX^TX] = \sum_{i,j}E[x^{(i)}{x^{(i)}}^TAx^{(j)}{x^{(j)}}^T] $$ There are $m(m-1)$ terms where $i\neq{}j$, and $m$ terms where they agree. Since $x^{(i)}$ and $x^{(j)}$ are iid, we can let $x$ denote either of them, and state $$ \frac{1}{m^2}E[X^TXAX^TX] = \frac{1}{m^2}( m(m-1) E[xx^T]AE[xx^T] + m E[xx^TAxx^T]) $$ The first term is just the variances. The second term needs isserlis theorem. It gives (again, einstein summation convention) \begin{align} E[xx^TAxx^T]_{ij} &= E[x_ix_kA_{kl}x_lx_j] \\&= A_{kl}(E[x_ix_k]E[x_lx_j]+E[x_ix_l]E[x_kx_j]+E[x_ix_j]E[x_lx_k] \\&= \left\{ \Sigma(A+A^T)\Sigma + \Sigma \operatorname{Tr}(A\Sigma)\right\}_{ij} \end{align} Armed with this, we can simply state $$ \frac{1}{m^2}E[X^TXAX^TX] = \frac{m-1}{m^2} \Sigma{}A\Sigma{} + \frac{1}{m}(\Sigma(A+A^T)\Sigma + \Sigma \operatorname{Tr}(A\Sigma)) $$ QED

Now, we look at the general case.

Lemma: Assume $m$ independent gaussian random vectors $x^{(i)} \overset{iid}{\sim} \mathcal{N}(\mu,\Sigma)$. Row-stack them in the matrix $X$. Let $H=\Sigma+\mu\mu^T$.

Then \begin{equation} E\left[\frac{X^TX}{m}A\frac{X^TX}{m}\right] = \frac{(m-1)}{m}HAH + \frac{1}{m} \left( H(A+A^T)H - \mu\mu^T(A+A^T)\mu\mu^T + H\operatorname{Tr}(HA)) \right), \end{equation}

Proof: Let $x \sim \mathcal{N}(\mu,\Sigma)$. Define $y=x-\mu$.

As before, \begin{equation} \frac{1}{m^2}E[X^TXAX^TX] = \frac{1}{m^2}( m(m-1) E[xx^T]AE[xx^T] + m E[xx^TAxx^T]) \label{eq:one} \end{equation}

The first term is simply \begin{equation} \label{eq:two} E[xx^T]AE[xx^T] = (\Sigma+\mu\mu^T)A(\Sigma+\mu\mu^T). \end{equation}

The second term needs more work. We expand it in $y+\mu$, and notice that all terms with 1 or 3 factors of $y$ must have expectation 0 due to either Isserlis theorem, or to the zero mean. There are in total 16 terms, but 8 gets zeroes out in that way. \begin{equation} \begin{split} E[xx^TAxx^T] &= E[(y+\mu) (y+\mu)^TA(y+\mu)(y+\mu)^T] \\ & = E[yy^TAyy^T] + E[\mu\mu^TA\mu\mu^T]\\ & + E[yy^TA\mu\mu^T] + E[\mu\mu^TAyy^T]\\ & + E[y\mu^TAy\mu^T] + E[\mu{}y^TA\mu{}y^T]\\ & + E[y\mu^TA\mu{}y^T] + E[\mu{}y^TAy\mu{}^T] \end{split} \end{equation} All these are simple expectations. The first one due to Isserlis' theorem. The second is the expectation of a constant. Term three and four are just covariances. Term five and six are computed by the initial Lemma that I needed some index manipulations for. Term seven is due to the inner three factors are scalar so the $y$'s can be combined. Term eight uses $y^TAy = \operatorname{Tr}(yy^TA)$ and that the trace is linear and thus commutes with expectations. So, we find \begin{equation} \begin{split} E[xx^TAxx^T] &= (\Sigma(A+A^T)\Sigma + \Sigma \operatorname{Tr}(A\Sigma)) + \mu\mu^TA\mu\mu^T\\ & + \Sigma{}A\mu\mu^T + \mu\mu^TA\Sigma{}\\ & + \Sigma{}A^T\mu\mu^T + \mu\mu^TA^T\Sigma \\ & + \Sigma\mu^TA\mu + \mu\mu^T\operatorname{Tr}(\Sigma{}A) \end{split} \end{equation} Some regrouping of terms yields \begin{equation} \begin{split} E[xx^TAxx^T] &= (\Sigma+\mu\mu^T)(A+A^T)(\Sigma+\mu\mu^T) - \mu\mu^T(A+A^T)\mu\mu^T\\ & + (\Sigma+\mu\mu^T)\operatorname{Tr}((\Sigma+\mu\mu^T)A) \end{split} \end{equation}

Combining the found results, we get \begin{equation} E\left[\frac{X^TX}{m}A\frac{X^TX}{m}\right] = \frac{(m-1)}{m}HAH + \frac{1}{m} \left( H(A+A^T)H - \mu\mu^T(A+A^T)\mu\mu^T + H\operatorname{Tr}(HA)) \right), \end{equation} where $H=\Sigma+\mu\mu^T$

QED

LudvigH
  • 354
  • 1
  • 10
  • dividing by $m$ simplifies a little. Assuming symmetry of A will also simplify the answer. Otherwise, I think this is about what to expect from the problem... – LudvigH Sep 22 '22 at 14:10
  • Thanks for going through the derivation, will try this formula out. It is indeed natural to assume symmetry of $A$, and to divide by $m^2$, since that demonstrates how formula interpolates between the simple formulas that exist for $m=1$ and $m=\infty$ – Yaroslav Bulatov Sep 22 '22 at 14:23
  • 1
    In differential geometry, this is known as a "debauch of indices." Applying a little bit of linear algebra at the outset to simplify the situation would shorten this demonstration considerably. – whuber Sep 22 '22 at 15:00
  • Thank you for introducing me to the term. I admit a shorter derivation is possible, but working in indices is a method that "just works" and generalizes to just about any problem of strange vector/matrix expectations. So I figured this would do. – LudvigH Sep 22 '22 at 15:23
  • I did some evaluations, and there's a bug in the formula, it only works when $\mu=0$. The correct formula for $m=1$ case is $E[xx'Axx']=H A H + H A^T H + H \text{Tr} H A - 2 (\mu^T A \mu) \mu \mu^T$ where $H=E[xx']$ – Yaroslav Bulatov Sep 22 '22 at 18:23
  • 1
    Thank you. Seems I did f' up then. Will double check and come back. – LudvigH Sep 22 '22 at 18:25
  • Are you sure about the answer? It seems to me the last term should be $\mu^T(A+A^T)\mu)\mu\mu^T$. I redid m calculations quite a bit more tidy, and got something that I believe more in. Feel free to double check me again – LudvigH Sep 22 '22 at 20:15
  • 1
    Woo hoo, works! – Yaroslav Bulatov Sep 22 '22 at 20:38
  • Looks like I also need $E[BX'XAX'X]$. If you want to take a stab at that one, I'll numerically verify and add it to the answer – Yaroslav Bulatov Sep 23 '22 at 04:28