4

I have access to samples of some distribution with second-moment matrix $\Sigma=E[xx^T]$ and need an estimate of $\|\Sigma\|_F^2$ (which can be used to set optimal size for LMS)

We can use Frobenius norm of sample second-moment matrix $\hat{\Sigma}$ below, but this gives a biased estimator for small $B$

$$\left\|\hat{\Sigma}\right\|_F^2=\left\|\frac{1}{B}\sum_i^B x_i x_i^T\right\|_F^2$$ For instance, take standard Gaussian in $d$ dimensions, sample estimator for $B=1$ gives $d^2$ in expectation, whereas correct value is $d$. How do I unbias it for general $B$?

Alternative way to phrase the question is this.

Another idea was to use following identity and replace expectation with sample means, but it has the same bias issue.

$$\|\Sigma\|_F^2=\frac{1}{2}\left(E[\|x\|^4]-E[\|x\|^2]+2\|E[x]\|^4\right)$$

enter image description here

notebook

Yaroslav Bulatov
  • 6,199
  • 2
  • 28
  • 42
  • 2
    Just out of curiosity, what's the advantage of the Gaussian moment identity over explicit reconstruction and norm calculation of the Monte Carlo estimate of the expected outer product? – John Madden Aug 10 '22 at 03:32
  • 2
    I can't afford to form the outer product, d is on the order of millions – Yaroslav Bulatov Aug 10 '22 at 03:33
  • If I understand the incompletely labeled plot correctly, it suggests that the bias rapidly decreases with sample size. With some Monte Carlo methods you could try some common functions as correction factors to see if an appreciable decrease in bias is achieved. – Galen Aug 10 '22 at 03:37
  • @YaroslavBulatov I see, thanks. Also, what makes you think this is possible in general? Since we can view, say 5 multivariate normal draws as applying $\mathrm{chol}(\Sigma)$ to 5 different standard normal draws, wouldn't there at this point be five directions in the 10D space which haven't been probed? In other words, isn't it impossible to measure the component of $\Sigma$ that's orthogonal to $\mathrm{span}(\mathbf{x}_1, \ldots, \mathbf{x}_5)$? Perhaps we can be saved by lowrank + lownorm assumptions on $\Sigma$? How does your simulation look without eigenvalue decay on $\Sigma$? – John Madden Aug 10 '22 at 14:36
  • (I should have said $\mathrm{span}(\mathbf z_1, \ldots, \mathbf z_5)$, where $z_{i,j}\sim N(0,1)$.) – John Madden Aug 10 '22 at 15:46
  • @JohnMadden I think it should be possible because eigenvalues of large covariance matrices decay at least as fast 1/i, so "effective dimension" is much lower than embedding dimension. In my application (gradients of neural nets), Tr(sigma)/||sigma)|| is <50 whereas embedding dimension is in the millions. – Yaroslav Bulatov Aug 10 '22 at 16:16
  • 1
    Also, the bias of the estimator is an issue. If I have an unbiased estimator with high variance, I'll observe this variance in practice and know not to trust it. Whereas here the estimator has low variance, but high bias – Yaroslav Bulatov Aug 10 '22 at 18:07
  • Potential solution -- Frobenius norm squared is the sum of squared entries of gram matrix. We have unbiased of diagonal entries (fourth moments) and this question would provide a way to obtain unbiased estimate of off-diagonal terms – Yaroslav Bulatov Dec 07 '22 at 18:47

0 Answers0