5

Suppose $x$ is sampled from zero-centered Gaussian with $d\times d$ covariance matrix $\Sigma$.

  • is there a name for distribution of $y=\frac{x}{\|x\|}$?

  • is there a closed form expression for covariance of $y$?

Yaroslav Bulatov
  • 6,199
  • 2
  • 28
  • 42

2 Answers2

3

I think that the distribution of $y=\frac{X}{||X||}$ where $X\sim \mathcal{N}(0, \Psi)$ is the projected normal distribution.

I came across a similar problem and used an approximation that works well in most cases. My approximations also work for the case of non-zero mean.

For reference, here is a previous question of mine, that I responded, for the case of $X\sim \mathcal{N}(\mu, \mathbf{I}\sigma^2)$ (i.e. isotropic noise). See the approach used in that question of approximating each element $i,j$ of the second-moment matrix (i.e. $\mathbb{E}\left(\frac{X_i X_j}{||X||^2}\right)$) as a ratio of quadratic forms that uses a symmetric $A$ in the numerator that is specific to each $i,j$.

For your question, we can use the exact same approach of approximating the elements of the second-moment matrix with that same ratio of quadratic forms, but generalizing the answer above to the case where $X\sim \mathcal{N}(\mu, \Psi)$ where $\Psi$ is any symmetric positive semi definite matrix. For this generalized quadratic form, a second order Taylor approximation can be found in this article, section 3.1.

Using the formulas in the paper above for each matrix $A$ corresponding to each element $i,j$ of the second moment matrix (as done in my other question I linked), and converting the resulting equations to matrix formulas, we get the following approximation:

\begin{equation} \label{nonIso} \mathbb{E}\left( \frac{XX^T}{||X||^2} \right) \approx \frac{\mu_N}{\mu_D} \odot \left( 1 - \frac{\Sigma^{N,D}}{\mu_N\mu_D} + \frac{Var(D)}{\mu_D^2} \right) \end{equation}

where the terms are defined as follows: \begin{equation} \begin{split} & \mu_N = \Psi + \mu \mu^T \\ & \mu_{D} = tr(\Psi) + ||\mu||^2 \\ & Var(D) = 2 tr(\Psi^2) + 4 \mu^T \Psi \mu \\ & \Sigma^{N,D} = 2 \left[\Psi \Psi + \mu \mu^T \Psi + \Psi\mu \mu^T \right] \end{split} \end{equation}

Note that while $\mu_N \in \mathbb{R}^{d\times d}$ and $\Sigma^{N,D} \in \mathbb{R}^{d\times d}$, $\mu_D \in \mathbb{R}$ and $Var(D) \in \mathbb{R}$. Also, importantly, $\odot$ is element-wise multiplication, and the ratios between matrices that appear there are also element-wise.

Maybe for your case of 0 mean you would be able to apply the same approach and find an exact formula for the ratios, instead of this approximation.

Finally, that is the second-moment matrix, not really the Covariance matrix. You can subtract the outer product of the expected value of the projected Gaussian, to obtain the covariance the following way: $$ Cov\left( \frac{X}{||X||} \right) = \mathbb{E}\left( \frac{XX^T}{||X||^2} \right) - \mathbb{E}\left( \frac{X}{||X||} \right) \mathbb{E}\left( \frac{X}{||X||} \right)^T$$

I would suppose that because your variable is centered and its symmetric, the $\mathbb{E}\left( \frac{X}{||X||} \right) = 0$, though I can't say for sure.

I tried this approximation on simulated data and it works quite well.

dherrera
  • 1,258
  • 8
  • 26
  • Interesting! I was particularly interested in special case of zero-centered Gaussian with diagonal covariance, entries 1,1/2,1/3,1/4,....1/d. What do these look like after normalizing? Simulation show new eigenvalues are also harmonically decaying, will check if your approach matches this – Yaroslav Bulatov Mar 14 '23 at 16:42
  • If you test out this approach, do let me know in the comments how it fares. I'm using this approach for a problem of mine with different covariance structures (diagonal covariance proportional to the mean vector, spatially-dependent correlations, etc), and would be interested in hearing if it does/doesn't work in other cases that may be relevant. – dherrera Mar 14 '23 at 17:27
  • Your "Var" variable is a scalar, and you are adding it to a matrix, are you assuming automatic broadcasting here? – Yaroslav Bulatov Mar 14 '23 at 20:07
  • Yes, it assumes automatic broadcasting – dherrera Mar 14 '23 at 21:37
  • I posted evaluation as an answer, it simplifies considerably for zero-centered trace-normalized Gaussian and gives a nice fit, thanks! – Yaroslav Bulatov Mar 14 '23 at 21:39
2

By applying dherera's formula to trace-normalized zero-centered Normal with diagonal $d\times d$ covariance matrix $H$, we get the following estimate for the corresponding covariance matrix $H_p$ of projected Normal.

$$H_p = c H -2H^2\\ c=(1+2\operatorname{Tr}(H^2)) $$

When $H$ eigenvalues follow power-law decay with $p=1.1$ we can estimate eigenvalues of $H_p$ using Monte-Carlo and see the following fit:

enter image description here

For $1<p<2$, this estimate is slightly biased.

For instance, when $p=1.1$ using "adjusted" formula by changing $c$ to be $c=(1+1.7\operatorname{Tr}(H^2))$ gives a slightly nicer fit: enter image description here

Error measures mean relative residual squared, tail error only considers smallest $d/2$ eigenvalues.

Notebook

Yaroslav Bulatov
  • 6,199
  • 2
  • 28
  • 42
  • The "adjustment" needed to render dherrrera's formula unbiased seems to be dependent on $p$ in p-powerlaw decay eigenvalues, wondering if there's an automatic way to obtain it. Since bias is present in tail eigenvalues, perhaps it could be derived by taking limit $d\to \infty$ – Yaroslav Bulatov Mar 14 '23 at 21:58
  • Just to share my own results, I'll say that when testing the formula on my own problem with covariance matrices with different structures, I found the approximation to be close to perfect for some covariances, and to have some bias for other covariances. Interestingly, I had found the approximation to work great for diagonal matrices in my case (random diagonals and diagonals proportional to the mean). However, performance evidently depends on the specifics, as your case shows. I'm still working on this problem, so I'll post updates if I find anything useful. – dherrera Mar 14 '23 at 22:47
  • Sounds good, please comment when you do. I'm assuming random diagonal is going to be close to uniform. I'm looking at diagonal with values 1^p,2^p,3^p,.... Worst case bias is when p=-sqrt(2) – Yaroslav Bulatov Mar 14 '23 at 23:16
  • Also I'd be curious if your work gave any insight to fourth-moments. For Gaussian, 4th moment factors according to Wick's theorem, I suspect for projected Gaussian it approximately factors -- https://stats.stackexchange.com/questions/608943/computing-eyycyy-for-normalized-gaussian-y – Yaroslav Bulatov Mar 15 '23 at 00:02
  • The paper I linked, which has the formula to approximate the expected value of E(X'AX/X'BX), also has a formula approximating Var(X'AX/X'BX). I haven't thought it through, but maybe you can just use all the same approach I describe above, and if you substitute the formula of the Expectation for the formula of the Variance, + some tweak you may be able to approximate the fourth moments. The paper I linked above says that the variance approximation performs worse than the Expectation. – dherrera Mar 15 '23 at 00:19
  • 1
    The paper, however, seems like a good place to start looking for other references on moments of quadratic forms, which is what the problem ends up boiling to. Either exact formulas for the 0 mean case, or approximations to the second moment of the quadratic form (fourth moment of the X/||X|| variable). – dherrera Mar 15 '23 at 00:22