2

Given a gamma distribution with unit scale and shape $\theta$, and given an arbitrary variate $x$, what is the derivative of the variate $x$ with respect to $\theta$?

In other words, I would like to produce a stochastic function: $$ \begin{align} g: \mathcal R &\to \mathcal R \times \mathcal R \\ \theta &\mapsto \left( z, \frac{dz}{d\theta} \right) \end{align} $$ where $z \sim \mathrm{Gamma}(\theta, 1)$.

This kind of question comes up in machine learning with variational autoencoders whenever we try to differentiate loss functions with respect to parameters that generate distributions. A common work-around is to use the reparametrization trick.

My goal with this question is to avoid using the reparamitrazation trick and instead improve the machine learning library that I use to arm it with an appropriate so-called "JVP function".


Here's some example Python code using the Jax library:

import numpy as np
from jax import value_and_grad
from jax.random import gamma, PRNGKey

key = PRNGKey(123)

def f(theta): return gamma(key, theta)

print("theta z z_dot") for theta in np.arange(0.1, 2.5, 0.4): z, z_dot = value_and_grad(f)(theta) print(f"{theta:.3f} {z:.3f} {z_dot:.3f}")

Prints:

theta     z z_dot
0.100 0.001 0.071
0.500 0.425 1.227
0.900 1.013 1.266
1.300 1.489 1.213
1.700 1.973 1.186
2.100 2.446 1.167

Changing the random seed produces different variates and different derivatives.


Actually, looking at this more clearly, it may not be the derivative I need, but rather the Hessian. It's probably too late to change this question so I'll leave it up. If it's not too much of an imposition, and I get a good answer to this, I'll ask another question about the Hessian.


Edit to explain what we're trying to do. First, see the short section on "undifferentiable expectations" here. Now, let's keep all the same notation ($\theta, x, z, \epsilon, f, g, p$). Suppose that you're using a differentiable programming library.

Let's consider the "reverse mode" wherein we want to keep "primals" and "cotangents". In this case, that means that we sample $\epsilon$, calculate $z$, then $L \triangleq f(z)$, which we take to be our loss. Then, we calculate $\ddot z \triangleq \frac{dL}{dz} = f'(z)$, and then by backpropagation, we want $\ddot \theta \triangleq \frac{dL}{d\theta} = \frac{dL}{dz} \frac{dz}{d\theta} = \ddot z \frac{dz}{d\theta}$. It's this latter term that is the subject of this question: $\dot z \triangleq \frac{dz}{d\theta}$.

I realize that this may not be well-defined in general, but it appears to be well-defined for the gamma distribution since Jax has no problem producing it.

Neil G
  • 15,219
  • 1
    Do you mean derivative of the probability of $x$ w.r.t. $t$? ... as $x$ is not an output value of the gamma distribution function, but the probability of $x$ is. – jbowman Aug 16 '23 at 23:51
  • I have taken the liberty of editing your question so that it asks about the derivative of the density function at $x$, which I assume was your intention. Please edit further if this is not what you intended. – Ben Aug 17 '23 at 00:34
  • @jbowman I mean the derivative of the variate. I'll edit the question. (Wrote it at the gym.) – Neil G Aug 17 '23 at 00:49
  • It is not clear what you mean by "the derivative of the variate". If you are referring to the random variable, this is not a function of $t$ so its derivative with respect to $t$ would be (trivially) zero (if well-defined at all). Perhaps you can clarify what you are looking for here. – Ben Aug 17 '23 at 00:54
  • @ben I've added some text elucidating my the motivation, and some code that produces the values that I'm looking for. Please let me know if you have any ideas to get me started on this problem, or any suggestions for achieving my goal. – Neil G Aug 17 '23 at 01:11
  • This question is not well-defined, because you don't explain how the variable depends on $t.$ You cannot compute its derivative only from its distribution! – whuber Aug 17 '23 at 02:59
  • @whuber: I said that t is the shape parameter of the gamma distribution. Should I provide the density of the gamma distribution showing $t$? – Neil G Aug 17 '23 at 03:00
  • You are ignoring my point: by definition, a random variable (I presume that's what you mean by "variate;" if not, what?) is a function $X$ defined on a probability space. How exactly does it depend on $t$?? There are lots of different ways it could depend on $t$ where $t$ determines its distribution. They don't even need to be differentiable. – whuber Aug 17 '23 at 03:01
  • @whuber I do not mean variable. I mean a variate: an individual realization of the variable. Each realization does have a well-defined derivative with respect to the parameters that generated it. For example, the derivative of a variate $x$ with respect to a scale parameter $k$ is intuitively $\frac{x}{k}$.

    You could also examine the code if that's more clear for you. I stochastically produce x, x_dot from t.

    – Neil G Aug 17 '23 at 03:04
  • 2
    I cannot make sense of that, because a random "variate" is exactly that: random. It doesn't have a derivative in any meaningful sense, unless you somehow specify additional mathematical structure that (yet) is nowhere in evidence. – whuber Aug 17 '23 at 03:05
  • Sorry, but I believe that you're mistaken. I will dig up some references later tonight. – Neil G Aug 17 '23 at 03:05
  • 1
    If you believe so, then please edit your post to clearly indicate what you do mean. I am not saying you're wrong, but only that you haven't provided sufficient information to make sense of your question. – whuber Aug 17 '23 at 03:05
  • @whuber Sorry I didn't have a lot of time to look into this, but I think you're right about it not being well-defined as I wrote it. This is a good explanation of the reparametrization trick. I'll edit the question. – Neil G Aug 17 '23 at 11:42

0 Answers0