6

If $\boldsymbol{\beta} \sim \mathcal{N}_p(\boldsymbol{\mu}, \boldsymbol{\Sigma})$, can someone please help me understand why $\mathbb{E}[||\boldsymbol{\beta}||_2^2] = ||\boldsymbol{\mu}||_2^2 + \text{trace}(\boldsymbol{\Sigma})$

Further, how does this expectation change if we instead consider $\boldsymbol{\beta}^T\textbf{W}\boldsymbol{\beta}$, where $\textbf{W}$ is a diagonal matrix?

Thank you!

JohnRos
  • 5,684
Confused
  • 61
  • 1
  • 2

2 Answers2

1

$E[\beta]$ quantifies the expected squared Euclidean distance of a vector from the origin. The relation you stated holds for any random vector with finite second moment. It implies that the expected distance depends on the distance from the mean ($\mu$) to the origin, and the expected variability around this mean ($Trace(\Sigma)$).

$\beta W \beta$ is the Euclidean norm iff $W$ is the identify matrix. For general properties of moments of random quadratic forms, you can consult Section 6.2.2 in the [Matrix CookBook] (http://www.math.uwaterloo.ca/~hwolkowi/matrixcookbook.pdf), and references therein.

JohnRos
  • 5,684
1

The proof for the first question requires only some simple properties of the trace operator, the expected value, and the definition of the covariance matrix (see here). First, as mentioned in the title, $||\mathbb{\beta}||^2$ is a quadratic form, given by $||\mathbb{\beta}||^2 = \beta^T \beta$. Then we have:

$$ \mathbb{E}\left( \beta^T \beta \right) = \mathbb{E}\left( tr(\beta^T \beta) \right) \\ = \mathbb{E} \left( tr(\beta \beta^T) \right) \\ = tr \left(\mathbb{E} \left( \beta \beta^T \right) \right) \\ = tr \left(\Sigma + \mu \mu^T\right) \\ = tr \left(\Sigma \right) + tr \left(\mu \mu^T\right) \\ = tr \left(\Sigma \right) + tr \left(\mu^T \mu \right) \\ = tr \left(\Sigma \right) + \mu \mu^T = tr \left(\Sigma \right) + ||\mu||^2 $$

Here we just used a couple of basic probability and linear algebra properties:

  1. In line 1, because $\beta^T \beta$ is a scalar, $\beta^T \beta = tr(\beta^T \beta)$
  2. In line 2, we use the equality for the trace of a product that says $tr(A^TB) = tr(AB^T)$
  3. Because the trace is linear, we can move the expected value operator inside the trace
  4. From the definition of the covariance matrix, it can be shown that $\mathbb{E} \left( \beta \beta^T \right) = \mathbb{E}((\beta-\mu)(\beta-\mu)^T) + \mu\mu^T = \Sigma + \mu\mu^T$
  5. We use that the trace is linear, i.e. $tr(A+B) = tr(A) + tr(B)$.
  6. We again use the fact that $tr(A^TB) = tr(AB^T)$
  7. Because $\mu^T \mu$ is a scalar, we remove the trace operator (same as step 1), and $\mu^T \mu = ||\mu||^2$
dherrera
  • 1,258
  • 8
  • 26