8

Suppose we have a Gaussian distribution centered at zero, covariance matrix $\Sigma$ with $\operatorname{Tr}\Sigma=1$ and $\operatorname{Tr}\Sigma^2=\frac{1}{2}$

When I try various sequences of such Gaussian distributions (for instance, by letting $\lambda_i=c_1 i^{-c_2}$ and solving for $c_1,c_2$), I see the following:

$E\frac{1}{\|x\|^{4}} \xrightarrow{simulation} \frac{1}{E\|x\|^4}$

Can this be shown more rigorously?

enter image description here

notebook

Richard Hardy
  • 67,272
Yaroslav Bulatov
  • 6,199
  • 2
  • 28
  • 42
  • 1
    Looks like a case for Jensen's inequality – Glen_b Sep 06 '22 at 01:38
  • I may be missing something but Jensen's bounds the quantity in question from the wrong side – Yaroslav Bulatov Sep 06 '22 at 02:12
  • Oh, okay. My thought was that with a strictly convex function it should only reach equality when the variance is 0. – Glen_b Sep 06 '22 at 04:36
  • 3
    This norm is the following? $$\Vert \mathbf{x} \Vert^4 =\left(\sum_{i=1}^n x_k^2\right)^2$$ the fourth power of the Euclidian distance? – Sextus Empiricus Sep 06 '22 at 09:48
  • I guess that the distribution of $\Vert \mathbf{x} \Vert^4$ will approach a singular distribution (ie a constant value). Then you have for any continuous function $f$ that $E[f(x)] = f(E[x])$. The problem that remains, is proving that $\Vert \mathbf{x} \Vert^4$ approaches a singular distribution. Possibly some variant of law of large numbers can do that (it will be tricky because of the correlation between the terms $x_k$ in the summation). – Sextus Empiricus Sep 06 '22 at 09:55
  • Actually, the function f:x = 1/x is not continuous. The expectation E[1/X] may not exist or can be infinite. – Sextus Empiricus Sep 06 '22 at 10:07
  • @Sextus You're right -- I was forgetting that. (The spherical factor of $\rho^{n-1}$ multiplies the density and vanishes at the origin.) I'll delete that comment to avoid confusing anyone. – whuber Sep 06 '22 at 15:02
  • 2
    This question looks like it's making some unstated assumptions. After all, consider the sequence of diagonal covariance matrices in dimensions $d=1,2,3,\ldots$ with $(1/d,0,0,\ldots,0)$ on the diagonal. Their spectral norms $1/d$ converge to $0.$ Yet, in all those cases $E[1/||X||^4]$ is infinite while $E[||X||^4] = 3/d^2 \to 0.$ Indeed, the very statement of the question makes no sense, because the right hand side is itself a sequence rather than a number. Perhaps you mean something like $E[||X||^4]E[1/||X||^4]\to 1$? – whuber Sep 06 '22 at 18:35
  • $\operatorname{Tr}\Sigma=1$ is one of the original assumptions, so you can't have (1/d,....) – Yaroslav Bulatov Sep 06 '22 at 18:46
  • the ratio of E[||X||^4] and E[1/||X||^4] going to 1 would explain the observed phenomenon – Yaroslav Bulatov Sep 06 '22 at 18:53
  • There was a suggestion on Mathoverflow to use an integration trick which may justify switching the order of E/reciprocal -- https://mathoverflow.net/questions/431036/does-e1-f-to-1-ef-in-high-dimensions?noredirect=1#comment1109319_431036 – Yaroslav Bulatov Sep 22 '22 at 22:43

1 Answers1

5

Reformulation in terms of linear combination of $\chi^2(1)$ variables

We can reformulate the problem.

Let's rewrite $$\Vert \mathbf{x} \Vert^4 =\left(\sum_{i=1}^n x_k^2\right)^2 = Y_n^2$$

Such that we can focus on the variable

$$Y_n = \sum_{k=1}^n x_k^2$$

and the problem statement in terms of $Y_n$ becomes

$$E[1/Y_n^2] \to 1/E[Y_n^2]$$

we can make another reformulation and consider the eigenvalues $\lambda_k$ of the matrix $\Sigma$ then we can express $Y_n$ as a linear combination of $n$ iid chi squared variables.

$$Y_n \sim \sum_{k=1}^n \lambda_k Z_k \qquad \text{where $\forall k:Z_k \sim \chi^2(1)$}$$

where we have the conditions that

  • $\sum_{k=1}^n \lambda_k = 1$, which relates to the condition that $Tr(\Sigma) = 1$.
  • $\max(\lambda_k) \to 0$, which relates to the spectral norm approaching zero.

The mechanism behind the convergence

The expectation and variance of $Y_n$ is

$$E[Y_n] = \sum_{k=1}^n \lambda_k = 1$$

and

$$\text{Var}[Y_n] = \sum_{k=1}^n 2 \lambda_k^2 \to 0$$

Intuitively: The variable $Y_n$ approaches a constant value $1$ and that is how $E[1/Y_n^2]$ will approach $1/E[Y_n^2]$.

I am not sure how to make this formal. I am thinking about something like the continuous mapping theorem. If $Y_n \to 1$ then $f(Y_n) \to f(1)$. But I am not sure whether the decreasing variance is sufficient to state that $Y_n \to 1$ and what sort of convergence is exactly needed or allowed to make the statements.

A problem with the convergence

In intuitive terms we see that the variance shrinks to zero and that is what makes the convergence happen, at least seemingly in simulations. A point that worries me is that a function like inverse $E[1/Y_n^2]$ can involve division by zero and result into an infinite or undefined expectation. For instance if we have some normal distributed variable $W_n \sim \mathcal{N}(1,1/n)$ then we do not get convergence $E[1/W_n] \to 1/E[W_n]$ because the expectation of $1/W_n$ is undefined.

So a problem with the above intuitive reasoning is that the $E[1/Y_n^2]$ may be undefined when the density of $Y_n$ at zero is finite. For instance the inverse of the square of a chi-squared distribution has no finite expectation value when $\nu \leq 4$ (see the variance of a inverse chi-squared distribution).

What we need to proof is that we can not have the $\lambda_k$ in such a way while $max(\lambda_k)$ approaches zero.

I imagine for instance a dominant term that approaches zero very slowly while the remaining terms approach zero very quickly. E.g. some slowly decreasing function of $n$ such that

$$\lambda_k = \begin{cases} f(n) &\quad \text{if} \quad k=n \\ \frac{1-f(n)}{n-1} &\quad \text{if} \quad k\neq n \end{cases}$$

Then $Y_n$ is a sum of two chi squared variables, one with 1 degree of freedom and another with $n$ degrees of freedom.

$$Y_n \sim f(n) \chi(1) + \frac{1-f(n)}{n-1} \chi(n-1)$$

I don't believe that this $Y_n$ has a non-zero density. I also don't believe that any other similar approach can result in a non-zero density for $Y_n$.

We have

$$Y_n \sim \sum_{k=1}^n \lambda_k Z_k > \sum_{k=1}^n \min(\lambda_k) Z_k \sim \Gamma(k=n/2, \theta = 2 \min(\lambda_k))$$

Because the $Y_n$ is gonna be made of at least 4 components (otherwise $\max(\lambda)$ can't approach zero) you get that the variable $Y_n$ is at least as large as a scaled chi-squared variable with more than 4 degrees of freedom and the density at zero should should be zero.

Paul
  • 10,920
  • The approach of the density to zero in the point zero $$f_{Y_n}(0) \to 0$$ , might possibly also be reasoned by using the characteristic function which will be a product $$\varphi(t) = \prod_{k=1}^n (1-2\lambda_k it)^{-1/2}$$ and the inversion formula which will lead to $$f_{Y_n}(0) = \frac{1}{2\pi} \int_{-\infty}^{\infty} \varphi(t) dt$$ but I am not yet sure how to show that this integral will equal zero. – Sextus Empiricus Sep 07 '22 at 10:15
  • Something in the answer needs to be still improved. It is not just important that the desnity is zero at the point zero $f_{Y_n}(0) = 0$, we must also have the derivative equal to zero. – Sextus Empiricus Sep 07 '22 at 10:37
  • Showing that $f_{Y_n}^\prime(0) = 0$ and $f_{Y_n}(0) = 0$ might be possible by writing the distribution density function as a mixture of Chi squared distributed variables and then use the coefficients of the smaller order terms to compute the values of the density. – Sextus Empiricus Sep 07 '22 at 12:23
  • Would de la Vallée-Poussin theorem be useful here? It was used to show that you can switch order of expectation and reciprocal for $\left|x\ \sum_ix_i \right|^2$ – Yaroslav Bulatov Sep 26 '22 at 05:56
  • 1
    @YaroslavBulatov I think an important step in that lemma is by considering the inequality $\Vert x \Vert \geq R$ where $R$ is a gamma distributed variable. For the inverse gamma distribution the moments exist if the shape parameter is sufficiently large and so the expectation of $(1/\Vert x \Vert)^p$ also exist. – Sextus Empiricus Sep 26 '22 at 07:47
  • So it seems the tricky part is establishing conditions equivalent to "shape parameter is sufficiently large". Intuitively, the spectrum of $\Sigma$ should be heavy tailed. Simply restricting first two moments are not sufficient. However, using polynomial decay with restriction on first 2 moment constraint may be. Some discussion on conditions is here -- https://mathoverflow.net/a/431220/7655 – Yaroslav Bulatov Sep 28 '22 at 06:11
  • BTW, if eigenvalue decay is fixed at power-law 1, it turns out you can switch the order of E and reciprocal -- https://math.stackexchange.com/questions/4551590/what-is-e1-x-4-where-x-sim-gaussian0-c-cdot-diag1-1-2-1-3-1-4-ld/4552910#4552910 – Yaroslav Bulatov Oct 15 '22 at 17:16