Why convolving a function with a Gaussian kernel is the same as adding a Gaussian noise to the input?

Question

I am implementing accelerated Langevin Dynamics (LD) for posterior estimation with prior presented with deep autoregressive network from paper [1]. I have a question about the prior smoothing operation. The paper says that for the acceleration of LD, we need progressively to smooth the pdf, starting from high noise values and decreasing the noise until we arrive at the original distribution. From the paper, I understood that the operation of convolving the prior function $p(x)*\omega$ (where $\omega$ is a Gaussian function with $0$ mean and std $\sigma$) is equivalent to $p(x+\epsilon)$ (where $\epsilon$ is a Gaussian noise $\mathcal{N}(0;\sigma^2)$). I am interested in whether there is a mathematical explanation that justifies this claim.

[1] Jayaram, Vivek, and John Thickstun. "Parallel and flexible sampling from autoregressive models via Langevin dynamics." International Conference on Machine Learning. PMLR, 2021. https://proceedings.mlr.press/v139/jayaram21b.html

P.S. I apologize if I put the wrong tags, it is my first post, and , frankly, I don't know what exactly I am looking for.

You can find this answered here on CV by searching the key words. Try convolution sum random variable. The procedure is known as "mollification" in the mathematical literature. We have some posts discussing it. — whuber, Jul 27 '23 at 17:23

Plop · Accepted Answer · 2023-07-28T12:22:46.093

1

What is true is the following: if $\epsilon \sim \mathcal{N}(0,\sigma^2)$, then $\mathbb{E}[p(x+\epsilon)] = (p * \omega)(x)$, if $\omega$ is the density of $\mathcal{N}(0,\sigma^2)$. In fact, this formula is a special case of a more general formula, valid for any law!

That is, if $E$ is a random variable of density $\phi$, then $\mathbb{E}[p(x+E)] = (p * \widehat{\phi})(x)$, where $\widehat{\phi}$ denotes the map $y \mapsto \phi(-y)$. Here is the proof: $\mathbb{E}[p(x+E)] = \int p(x+y)\phi(y)dy = \int p(z)\phi(z-x)dy = \int p(y)\widehat{\phi}(x-y)dy = (p*\widehat{\phi})(x)$. Note that since the density of the gaussian is symmetric, $\widehat{\omega} = \omega$.

So, the relation you quote is true « on average », that is, for all $x$, the random variable $p(x+E)$ has expectation $(p*\widehat{\phi})(x)$. However, if $p$ is not constant, then this random variable has no reason to be almost surely constant, hence should not be almost surely be equal to the value of the convolution. Stated otherwise, if you can generate some random $y$ values from a $\mathcal{N}(0,1)$ source and compute $p(x+y)$ for these $y$'s, you will very likely get a bunch of different results. However, the average of these values is indeed $(p*\widehat{\phi})(x)$.

edited Jul 28 '23 at 12:22

answered Jul 27 '23 at 15:28

Plop

262

Thank you for this fantastic answer. Can I also ask you to direct me to the topic that studies this question? Because it is the first time I have seen this notation x−ϵ contrary to the "common" engineering notation where noise is added to a variable x+ϵ . – ane4ka Jul 27 '23 at 16:02
What does "true on average" mean?? – whuber Jul 27 '23 at 17:24
I see now that I was overthinking it. It comes straight from the definition of CDF of a sum of two random variables. But I am still disconcerted with $x-\epsilon$. Is there a particular purpose of using subtraction instead of a sum? – ane4ka Jul 27 '23 at 18:18
It's a matter of convention. Convolution is usually defined to correspond to a sum, but since the distribution of $\epsilon$ is the same as the distribution of $-\epsilon,$ the result is the same. – whuber Jul 27 '23 at 19:32
I did the calculations very quickly before answering. I’m not so sure about the sign, and did not worry too much since it doesn’t matter for a symmetric density like the Gaussian one. – Plop Jul 27 '23 at 23:37
I added something for @whuber and will redo the calculations and post them tomorrow! – Plop Jul 27 '23 at 23:45
I stick with my signs, however I changed the phrasing a bit to stick with the convention of added (instead of) substracted noise. But it doesn't really change anything. – Plop Jul 28 '23 at 12:23
@whuber I am very sorry I did not accept this answer in the past. I just have one little question that I am embarrassed to ask. I looked over and over again and did not understand how you made this transformation, the one when you came from $\int p(z)\phi(z-x)dy$ to $\int p(y)\hat{\phi}(x-y)dy$. I tried to do it myself, but I only got to $z=x+y, dz=dy$, so $\int p(z)\phi(z-x)dy=\int p(z)\phi(z-x)dz=\int p(z)\hat{\phi}(x-z)dz$. – ane4ka Jul 30 '23 at 20:31
Sorry again. I was addressing my previous comment to @Plop not to whuber last time. It does not let me edit the comment after 3 times. – ane4ka Jul 30 '23 at 20:46
Hum, I just changed the letter, I guess? I'm not sure what more I can say. Is it something else? – Plop Jul 31 '23 at 08:15
@Plop No. I was just worried that I had missed something or misinterpreted your answer. – ane4ka Jul 31 '23 at 15:05
Ok, no problem, then :)! – Plop Jul 31 '23 at 16:35

Why convolving a function with a Gaussian kernel is the same as adding a Gaussian noise to the input?

1 Answers1