Using Generative Adversarial Networks for joint distribution estimation

Question

I am trying to use GAN model to generate N-dimensional samples with joint probability distribution that looks like some training data. I am having trouble getting the probability distribution of the generated data to match the training data.

As a training example, I created a 2-dimensional training dataset from bimodal Gaussian distribution as shown here:

I am using dense layers for both generator and discriminator networks. Like this

GenerativeAdversarialNetwork(
  (generator): Sequential(
    (0): Dropout(p=0.2, inplace=False)
    (1): Linear(in_features=100, out_features=100, bias=True)
    (2): Tanh()
    (3): Linear(in_features=100, out_features=100, bias=True)
    (4): Tanh()
    (5): Linear(in_features=100, out_features=100, bias=True)
    (6): Tanh()
    (7): Linear(in_features=100, out_features=100, bias=True)
    (8): Tanh()
    (9): Linear(in_features=100, out_features=2, bias=True)
    (10): Tanh()
    (11): Linear(in_features=2, out_features=2, bias=True)
  )
  (discriminator): Sequential(
    (0): Linear(in_features=2, out_features=32, bias=True)
    (1): LeakyReLU(negative_slope=0.01)
    (2): Linear(in_features=32, out_features=32, bias=True)
    (3): LeakyReLU(negative_slope=0.01)
    (4): Linear(in_features=32, out_features=32, bias=True)
    (5): LeakyReLU(negative_slope=0.01)
    (6): Linear(in_features=32, out_features=16, bias=True)
    (7): LeakyReLU(negative_slope=0.01)
    (8): Dropout(p=0.2, inplace=False)
    (9): Linear(in_features=16, out_features=1, bias=True)
    (10): Sigmoid()
  )
)

The closest that I could get was some bimodal distribution that is densely located at the two centers of the bimodal Gaussian distributions from the real data.

I have tried using the following recommendations from here and other places without any success.

Using Adam optimizer
Using dropout layers
Playing with bigger networks
Playing with learning rates
Playing with different number of iterations or thresholds for training generator and discriminator networks.

My question is twofold:

Is there anything I am missing?
I have mainly seen people using GANs for image generation and claiming that the network learns the PDFs. But all the claims about the learned PDFs are qualitative, such as claiming that the generated images look good enough. Is there any demonstration of the actually matching PDFs. Such as 1D or 2D datapoints that can be easily visualized. It is easy for me to imagine that the generated samples are from some areas in the total distribution (like the orange samples that I am generating above) but it does not capture the total distribution at all.

Sorry about the long question.

in my experience, relus are much better than tanhs in generating from multimodal distributions. — John Madden, May 05 '23 at 14:47
@JohnMadden I did try ReLUs for the generator, the problem that I had was that the generated distribution looked like a trace on a line. See an example here. The Tanh activation tend to have a softer cloud-like shape closer to Gaussian. — dvd8719, May 05 '23 at 18:18
can you post the loss function trace for both generator and disc? You may in this case try a mix, with relus early on (more discontinuous, and easier to form the two clusters) and then tanh later to flesh out the shape of the Gaussian. But I think you just have an optimization issue here. — John Madden, May 05 '23 at 18:35
Here you can see the model loss and some examples of generated versus real data at end of training and epoch 2200 before the discriminator loss goes down and never recovers. The activators for both generator and discriminator networks are ReLU. I don't see the benefit of using ReLU as I was able to get the bimodality using Tanh. — dvd8719, May 05 '23 at 20:12
Gotcha. And can you clarify what you meant by "playing with bigger networks"? Was this depth or width? One thing I'd try is like, 2,000 width instead of 100. Also, what's the motivation behind the two layers of size 2 at the end? — John Madden, May 05 '23 at 20:18
Regarding "bigger networks" I tried networks of width 10 and then increasing it to 100. I also tried deeper networks like 100 layers. The problem with deeper networks are that the output at the initial steps gets increasingly small and hard to train. I will try networks of 2000 depth and let you know. Regarding the last layer of 2x2: The intention was to have some rescaling of the output in the last layer, but now I see that the the last Tanh and 2x2 layers are redundant. I will get rid of them as well. — dvd8719, May 05 '23 at 22:33

score 3 · Answer 1 · answered May 05 '23 at 13:31

3

GANs are generally known to be hard to train and to collapse into producing only one "kind" of samples. They work well on images, because the discriminator forces the generator to produce good looking images. But usually (in my experience) you will have a hard time to get a GAN to produce the correct distribution. I think normalizing flows or diffusion models are way more likely to solve that problem.

answered May 05 '23 at 13:31

mathguy23123

131

So, if I understand your answer correctly, you are saying that GAN can produce samples that lies within the real PDF (minimizing for reverse KL divergence as shown here) but it is really hard to optimize the network to a point that the generated PDF matches the real PDF. This is an acceptable situation for image generation, and we can use it. But other applications such as statistical inference we need better convergence and other methods are needed. Is this correct? – dvd8719 May 05 '23 at 17:16
Yes this is my understanding of the situation. – mathguy23123 May 08 '23 at 15:01

score 0 · Answer 2 · answered May 04 '23 at 22:53

0

I have mainly seen people using GANs for image generation and claiming that the network learns the PDFs. But all the claims about the learned PDFs are qualitative, such as claiming that the generated images look good enough. Is there any demonstration of the actually matching PDFs. Such as 1D or 2D datapoints that can be easily visualized. It is easy for me to imagine that the generated samples are from some areas in the total distribution (like the orange samples that I am generating above) but it does not capture the total distribution at all.

One of the issues with being strictly quantitative in the comparison of the PDFs is that they aren't the same, and large sample sizes will catch small differences when they are brought to hypothesis tests (as they should, I argue). Thus, how close the distributions have to be for the GAN to be useful comes down to a judgment call. Sure, that can be quantified with something like KL divergence, but KL divergence lacks the intuition that you get from looking at synthetic images that a human cannot distinguish from real images, particularly when the litmus test probably is if a human can tell the difference.

answered May 04 '23 at 22:53

Dave

62,186

I get the point that we will never get a perfect match between two PDFs (or any floating numbers for that matter) if we look hard enough, but there should be some level of quantification. Understand that visual inspect is good test for image generation, but I think if we claim that there is some level of PDF matching, there should be some quantification with that. Otherwise, I think the true conclusion from visual inspection is that GAN can generated samples that belong to the PDF of training data. But the generated samples do not necessarily match the PDF (see my generated vs real plot above) – dvd8719 May 05 '23 at 17:08
The trouble with this is that, when you set your threshold for the PDFs being close enough to be useful, what you probably mean is that the PDFs are so close that humans cannot reliably distinguish between the generated images. If you have some way to quantify that in terms of, say, KL divergence, fine, but I find it legitimate to evaluate human perception of the images as being real by evaluating if humans believe the images to be real. – Dave May 05 '23 at 20:09
You are right about the images and the human perception as a good criterion for the application, but then we cannot make the claim that the generator network has actually matched or imitated the PDF of the real data. For example, if we are generating examples of human faces, our model could be only generating young faces of boys or old faces of women that are undistinguishable by human, but totally lacking any generative capabilities for any other application. Then we cannot make any claim about learning the actual PDF. – dvd8719 May 05 '23 at 22:40
That seems like a feature, not a bug. You basically know the distributions are different. – Dave May 06 '23 at 00:13
I am not saying there is any bug. I am asking why I am not finding any quantitative data on the similarity between the PDFs. – dvd8719 May 06 '23 at 20:04

Using Generative Adversarial Networks for joint distribution estimation

2 Answers2