Why does the reparameterization trick work when some components are still stochastic?

Question

I am trying to understand the reparameterization trick. I got some intuition while looking at this popular question, but I still feel largely confused. I am putting my understanding and doubts here and would appreciate the help of the community. Let's assume -

$y = 10$

$\hat{y} = w_3*z + b$

$z \sim N(\mu, \sigma^2)$

$\mu = w_2*4$

$\sigma = w_1*3$

Now, if I had to compute, $de/d{\mu}$ (e is the error function which is the difference between $y$ and $\hat{y}$), I wouldn't be able to apply chain rule and compute $dz/d{\mu}$. This is because z is a sample of the Normal distribution and therefore stochastic with parameters $\mu$ and $\sigma$ and therefore changing them based on z might not be a good idea. So, we come up with the following $z$ -

$z = \mu + \sigma*\epsilon$ (Apparently $z = \mu + \sigma*\epsilon$ is the same as $z \sim N(\mu, \sigma^2)$)

$\epsilon \sim N(0,1)$

However, I don't see why this $z$ is much different from the previous one. This $z$ is still stochastic, due to the presence of $\epsilon$. Perhaps my understanding is wrong.

After you sample $\epsilon$, it is constant. You don't need to back-propagate anything about $\epsilon.$ So it's no different than taking the derivative of some function $f(x) = c g(x)$, where $c$ is a constant and $g$ is some function. — Sycorax, Aug 16 '23 at 13:31
What was the problem with the initial z that was being drawn from a normal distribution? Why couldn't we backpropogate through that? — desert_ranger, Aug 16 '23 at 23:26

Why does the reparameterization trick work when some components are still stochastic?

0 Answers0