2

I am currently reading https://www.ijcai.org/proceedings/2018/0374.pdf ,this is a research paper based on Student-t Variational Autoencoder for Robust Density Estimation , In this research paper, they told about a zero variance problem in Gaussian VAE's and proposed a new idea of Student-t VAE to overcome that problem , I read many online sources but most sources just state about the uses of Gaussian VAE's but no one stated anything about this problem , I am new to VAE's and Multivariate statistics , so it is a bit hard for me to understand the concept of Student-t VAE's and how it is overcoming the shortcomings of Gaussian VAE's .
I have read and understood some parts but still some parts are unclear , I will appreciate if someone can help?
Edit:
I understood till the part of the zero variance problem like how Gaussian VAE's become unstable because of the variance term in denominator in the Lower Bound , we are trying to maximize but after that they started with a concept of conjugate priors like precision parameter has Gamma distribution as conjugate prior , from there how they reached to t-distribution and how they are going to use this in the VAE model , this I could not understand.

Jarvis
  • 21
  • The abstract in the link that you provide seems like a good summary of the problem & their proposed solution. You say that you've understood parts but that other parts are not unclear. Since it would not be helpful to you for a person to explain things that you already understand, can you [edit] to be specific about what you know, what you'd like to know, and where you are stuck? – Sycorax May 15 '23 at 13:41

1 Answers1

2

Prior, conjugate, and posterior distributions are concepts from ian statistics. A good introduction is Bayesian Data Analysis by Gelman et al., currently in its third edition. Loosely, a prior distribution is a mathematical statement about how you believe the parameters of a model, or another quantity of interest, are distributed before you even look at the data.

The core concept in this paper is that the authors are trying to prevent the model from estimating variances near 0, because that creates training instability. To ameliorate this, the authors assign a prior probability distribution to the variance, so there is only a very small probability mass on the values of variance that are near 0.

This is analogous to penalizing a loss function for certain solutions, so that the model doesn’t estimate the variance to be near 0. The effect is that the model's posterior probability distribution avoids training instability that arises from near-0 variance estimates.

Sycorax
  • 90,934