2

Let $X \sim \mathcal{N}(0, \sigma_x^2)$ and let $f :\mathbb{R} \to \mathbb{R}$ be a smooth nonlinear transformation such that $\mathbb{E}[f(X)]=0$. I am wondering what kind of restrictions one can put on the function $f$ such that I can find a (uniform) lower bound of the KL-distance between the random variable $f(X)$ and any mean zero Gaussian random variable $Y$.

More precisely, I wish to make the statement that for a fixed $f:\mathbb{R} \to \mathbb{R}$ belonging to some class of functions $\mathcal{C}$, there exists an $\varepsilon > 0$ such that for any mean zero Gaussian random variable $Y$ we have

\begin{equation} KL(f(X) \Vert Y) \geq \varepsilon. \end{equation}

We make the further requirement that $\mathbb{E}[f(X)]=0$. I am wondering what are some sufficient criteria for the KL distance to be uniformly bounded below. I have read up on Pinskers inequality, $$ \delta (P,Q) \leq \sqrt{1/2KL(P \Vert Q)} $$ where $$ \delta(P,Q) = \sup\{|P(A) - Q(A)| : A \text{ measurable}\} .$$ This implies a sufficient condition, namely that $$ P_X(\{x \in \mathbb{R} : f(x) \geq 0\}) \neq 0.5 $$ since we would have $$ P_{f(X)}((0, \infty)) \neq P_Y((0,\infty))= 0.5 $$ and thus the quantity $0 < |P_{f(X)}((0, \infty))-0.5|\leq \delta(P_{f(X)},P_Y)$ could be used to bound the KL distance uniformly. I am thus looking for the greatest possible class of functions $\mathcal{C}$ such that the statement above holds.

EDIT: As whuber points out, a better way to ask my question would be to define $$\mathcal{C} = \{f :\mathbb{R} \to \mathbb{R} | f \text{ measurable }, \text{ there exists } \varepsilon > 0 \text{ st. } KL(P_{f(X)} || \mathcal{N}(0, \sigma^2)) \geq \varepsilon \text{ for all } \sigma^2 > 0 \} $$

What I am looking for is then a way of verifying if a specific $f$ belongs to $\mathcal{C}$. For example, if $\mathbb{E}f(X) \neq 0$, then we certainly have $f \in \mathcal{C}$. Or if $P_{X}(\{x \in \mathbb{R} : f(x) \geq 0 \}) \neq 0$ we also have $f \in \mathcal{C}$. I am looking for easily testable such criteria such as those, if those exist.

In particular, I would like to be able to make a statement such as: if $f$ is smooth and non-linear, then $f \in \mathcal{C}$. Because all of the non-linear transformations that preserve Gaussianity that I have seen have been not been continuous.

Abm
  • 362
  • 1
    The question seems framed a little backwards. You have already defined your class of functions: they are precisely the set of functions that meet your criteria. It sounds like you are looking for some alternative characterization of that set, but that's an awfully vague question: what kinds of characterizations are you looking for? – whuber Aug 08 '23 at 20:18
  • Fair point, I have updated the question text. – Abm Aug 08 '23 at 23:52
  • Thank you. But it's easy to find a sequence of "smooth and nonlinear" functions $f$ for which $\varepsilon$ must shrink to zero. (Just making $f$ agree with the identity map within any neighborhood of $0$ will cause the KL divergences to converge to zero as $\sigma$ gets small.) You will need quantitative, not qualitative, characterizations. – whuber Aug 09 '23 at 13:47
  • 1
    BTW, intuitively the condition for arbitrarily small $\sigma^2$ means that if $f(x)$ does not closely approximate either $x$ or $-x$ in a neighborhood of the origin there will be a positive $\varepsilon$ for all $\sigma^2$ in any bounded interval; and the condition for arbitrarily large $\sigma^2$ means that if $f$ does not closely approximate the same $x$ or $-x$ in a neighborhood of ${\pm\infty},$ then there will be a positive $\varepsilon$ for all sufficiently large $\sigma^2.$ In other words, $\mathcal C$ contains all functions whose germs at ${0,\pm\infty}$ differ from $x$ or $-x.$ – whuber Aug 09 '23 at 15:06
  • Regarding the first point: I allow for the $\varepsilon$ to depend on both $f$ and $\sigma_x^2$. Thus I am only requiring it to hold uniformly over all $\sigma^2$ - sorry if this was not clear. With this in mind I believe that all smooth and non-linear $f$ would belong to $\mathcal{C}$. Because, as I believe for any such $f$ and $\sigma_x^2$ we have $f(X)$ is not Gaussian (I am not exactly sure how to prove this, but seems to be folklore), @picky_porpoise's answer gives a good argument as to why $f \in \mathcal{C}$ – Abm Aug 10 '23 at 00:13
  • Regarding second point: Thank you! This certainly gives a condition on which $f$'s might be in $\mathcal{C}$, and it holds uniformly over both $\sigma_x^2$ and $\sigma^2$. This is an even stronger requirement, and it might become very useful down the line. Had not thought of using the concept of germs in this problem. – Abm Aug 10 '23 at 00:17

1 Answers1

1

I think the statement holds without further requirements for $f$. My intuition is that, if there is no lower bound, then we can approximate $f(X)$ via a mean zero Gaussian variable as close as we wish in terms of KL distance. But the KL distance is only zero if $f(X)$ and $Y$ agree, so this seems impossible.

Maybe this can be proved via a contradiction, here is a rough sketch:

  1. Assume that there is no lower bound. Then we can find a sequence of mean zero Gaussians $(Y_n)$ such that $KL (f(X) \Vert Y_n) \to 0 $ for $n\to \infty$.

  2. The $\sigma_n^2$ of this sequence of Gaussians are w.l.o.g. contained in a finite interval $[a,b]$ since the KL distance will not tend to zero if $\sigma_n^2 \to \infty$ or $\sigma_n^2 \to 0$.

  3. Since $(\sigma_n^2)$ is in a compact set we can find a converging subsequence $(\sigma_{n_k}^2)$ such that $\sigma_{n_k}^2 \to \bar\sigma^2$ for $k \to \infty$. Call the mean zero Gaussian for this value $\bar Y$.

  4. Since the KL distance is nice enough we can interchange integration and taking limits (dominated convergence should be fine?) to get $$ 0 = \lim_{k \to \infty} KL(f(X) \Vert Y_{n_k}) = KL (f(X) \Vert \lim_{k \to \infty} Y_{n_k} ) = KL (f(x) \Vert \bar Y) . $$ But this implies that $f(X) = \bar Y$ in distribution, which is a contradiction, since $f(X)$ cannot be a Gaussian (since $f$ is smooth non-linear.)

Edit: I assumed from the phrasing of the question that the variance $\sigma_x^2$ of $X$ is a fixed parameter. As pointed out by Abm in the comment below, step 2 does not hold if $\sigma_x^2$ is allowed to vary.

Some background on when non-affine transformations of Gaussian variables are again Gaussian can be found here and here.

  • Thanks for you answer. The problem here is that $f$ being non-linear does not imply that $f(X)$ is non-Gaussian. There was a good post of examples of such non linear transformations that preserve gaussianity, but I cannot seem to find it right now. – Abm Aug 09 '23 at 00:03
  • Having thought about it, if we assume that $f(X)$ is non-Gaussian, then your argument should work. Thank you! – Abm Aug 09 '23 at 00:12
  • I don't see why (2) should hold. – whuber Aug 09 '23 at 13:48
  • Indeed (2) does not hold if we allow for both $\sigma_x^2$ and $\sigma^2$ to vary. Counterexample: any $f$ in the germ of $x$ at $0$, because then if the $\sigma$ and $\sigma_x$ go to zero, the KL divergence tends to 0. But it holds for any fixed $\sigma_x^2$: If $X \sim \mathcal{N}(0, \sigma^2_x)$ and $f$ is non-linear and smooth with $\mathbb{E}f(X)^2 < \infty$, then $\text{Var}(f(X)) \in (0, \infty)$. Thus we cannot have the $Y$-variances go to 0 nor to $\infty$. – Abm Aug 10 '23 at 00:28
  • 1
    Thank you for the feedback. I added some links which discuss the Gaussian transformation stuff and the post now mentions that $\sigma_x^2$ must not vary. – picky_porpoise Aug 10 '23 at 18:28