2

Summary: My aim is to create a (probabilistic) neural network for classification that learns the distribution of its class probabilities. The Dirichlet distribution seems to be choice. I am familiar with the reparametrization trick and I would like to apply it here. I thought I found a way to generate gamma distributed random variables (which are needed for the Dirichlet distribution) within the network (detailed explanation below).

My questions are:

  • Does the sampling process for a gamma distribution in Dirichlet Variatiational Autoencoder actually work for $\forall\alpha > 0$ or have I read it wrong and it does only work for $\alpha \le 1$?
  • If it does only work for $\alpha \le 1$, is there an alternative to the Dirichlet Distribution (i.e. Mixture of Gaussians as continuous approximation of the discrete multinomial distribution) in my case?

I already read two posts that touch the issue of the reparametrization trick for non-gaussian distributions. The first one made me think that my issue could not easily be resolved (Reparameterization trick for gamma distribution), the other one (Reparametrization trick with non-Gaussian distributions?) made me a little more optimistic. I read the paper mentioned in the post (Dirichlet Variatiational Autoencoder). It says:

  • Approximation with inverse Gamma CDF. A previous work Knowles (2015) suggested that, if $X ∼ Gamma(\alpha,\beta)$, and if $F(x; \alpha,\beta)$ is a CDF of the random variable $X$, the inverse CDF can be approximated as $F^{−1}(u; \alpha,\beta) \approx \beta^{−1}(u\alpha \Gamma(\alpha))^{1/\alpha}$ for $u$ a unit-uniform random variable.

When I compared the approximation to the rgamma function ($\alpha$ is varied, $\beta = 1$) in R, I saw that it only works relatively well when $\alpha \le 1$.

Comparison

When reading the original source of the approximation this was confirmed:

  • For $a < 1$ and $(1−0.94z)\;\log(a) < −0.42$ we use $F_{a,b}(z) ≈ (zaΓ(a))^{1/a}/b$.

Here is the R Code for the visualization above.

library(tidyverse)

alpha <- c(0.1, 0.25, 0.5, 1, 2, 4, 10) beta <- 1 n <- 100000 u <- runif(n = n)

values_actual <- map_df(c(0.1, 0.25, 0.5, 1, 2, 4, 10), function(alpha) tibble(data = rgamma(n = n, shape = alpha, rate = beta), alpha = alpha)) %>% mutate(type = "actual")

values_approximated <- map_df(c(0.1, 0.25, 0.5, 1, 2, 4, 10), function(alpha) tibble(data = (ualphagamma(alpha))^(1/alpha), alpha = alpha)) %>% mutate(type = "approximation")

rbind(values_actual, values_approximated) %>% mutate(type = as.factor(type)) %>% ggplot(aes(x=data))+ geom_histogram()+ facet_grid(rows = vars(type), cols = vars(alpha))+ theme_classic()+ labs(x="")

  • 1
    One sharp corner to be aware of -- There are two parameterizations of the gamma distribution in common use. https://en.wikipedia.org/wiki/Gamma_distribution If we compare the Dirichlet VAE paper and the R documentation, we see that these sources use alternative parameterizations. However, this isn't the cause of the behavior noted in the question, though, because you've fixed $k = \beta = 1$. – Sycorax Oct 17 '21 at 14:47

1 Answers1

0

I believe that for $\alpha=k+\alpha'$ where $k \in \mathbb{N}$ and $0 \leq \alpha' < 1$, you can just sample $z \sim \mbox{Gamma}(\alpha,1)$ using $z = \sum_{i=1}^k (-\ln U_i) + z'$ where $U_i \sim \mbox{Uniform}(0,1)$ and $z' \sim \mbox{Gamma}(\alpha',1)$, as described in the "Random variate generation" section of the Wikipedia page for the Gamma distribution.