TensorFlow Implementation of InfoGAN

Usage

$ python3 main.py --dataset CIFAR10 --noise_dim 64

NOTE: on Colab Notebook use following command:

!git clone link-to-repo
%run main.py --dataset CIFAR10 --noise_dim 64

Help Log

usage: main.py [-h] [--dataset DATASET] [--epochs EPOCHS]
               [--noise_dim NOISE_DIM] [--continuous_weight CONTINUOUS_WEIGHT]
               [--batch_size BATCH_SIZE] [--outdir OUTDIR]

optional arguments:
  -h, --help            show this help message and exit
  --dataset DATASET     Name of dataset: MNIST (default) or CIFAR10
  --epochs EPOCHS       No of epochs: default 50 for MNIST, 150 for CIFAR10
  --noise_dim NOISE_DIM
                        No of latent Noise variables, default 62 for MNIST, 64
                        for CIFAR10
  --continuous_weight CONTINUOUS_WEIGHT
                        Weight given to continuous Latent codes in loss
                        calculation, default 0.5 for MNIST, 1 for CIFAR10
  --batch_size BATCH_SIZE
                        Batch size, default 256
  --outdir OUTDIR       Directory in which to store data, don't put '/' at the
                        end!

Contributed by:

Atharv Singh Patlan

References

Title: InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets
Authors: Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, Pieter Abbeel
Link: https://arxiv.org/pdf/1606.03657.pdf
Tags: Neural Network, Generative Networks, GANs
Year: 2016

Summary

Introduction

Generative adversarial nets were recently introduced as a novel way to train a generative model. They consist of two ‘adversarial’ models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G.

However, the above specified GAN, termed as VanillaGAN, is not good in classifying the inputs provided to it, and hence generate an image as per our specifications. In order to do this, we need to tune the noise provided in the input provided to the GAN, and hence define a way so that the GAN learns to classify an image as belong to a given class, and also determine if it is real or fake.

Enter InfoGAN!

InfoGAN

The idea is to provide a latent code, which has meaningful and consistent effects on the output. For instance, consider the MNIST dataset, where we have 10 digits. It would be helpful if we could use the property of the dataset having 10 classes and be able to assign a given digit with a particular value. This can be done by assigning part of the input to a 10-state discrete variable. The hope is that if you keep the code the same and randomly change the noise, you get variations of the same digit.

The way InfoGAN approaches this problem is by splitting the Generator input into two parts: the traditional noise vector and a new “latent code” vector. The codes are then made meaningful by maximizing the Mutual Information between the code and the generator output.

Here V(D,G) is the standard Vanilla Gan loss, and I(c;G(z,c)) is the mutual information loss, with Lambda being sort of a regularization constant (the mutual information loss can be seen as a regularizing term

However, int the calculation of I(c;G(z,c)), we need to sample from the posterior distribution of the latent codes, which is usually intractable, and hence we replace it with a lower bound, calculated by approximating the posterior using an auxiliary distribution Q(c|x) and the reparameterization trick.

Where

Hence the final form of the loss function becomes:

Thus, the problem basically reduces to the following process:

Sample a value for the latent code c from a prior of your choice
Sample a value for the noise z from a prior of your choice
Generate x = G(c,z)
Calculate Q(c|x=G(c,z))

Implementation

In the implementation, we input a user defined number of noise variables, 10 categorical latent codes (hoping that in the output, each corresponds to a class of the datasets), and 2 uniform continuous latent codes (with values from -1 to 1), hoping that the correspond to some other features in the dataset

We use the following default configuration:

Binary CE to calculate the loss in real and fake samples detection
Categorical CE to calculate the loss in categorical classification
Ordinary Least Squares to calculate the loss in continuous variable detection (The continuous variables are uniform in the input but in the architecture predicts them in the form of a Gaussian Distribution. So i tried outputting the mean and log variance of the predictions and hence calculating the losses using the reparameterization trick, but upon applying some basic mathematics, I realized that it all boils down to calculating the OLS of the predicted values)
Lambda = 1, however, the weight given to the loss of the continuous codes can be varied (we used 0.5 for MNIST and 1 for CIFAR10)

Results

On MNIST Dataset

Results after training for 50 epochs:

NOTE: In this graph orange plot corresponds to dicriminator loss, blue to generator loss, Green to loss of continuous variables and Gray to loss in categorical variables.

Loss:

Plot of Real and Fake detection accuracies:

Here is the final image generated by the generator for a randomly generated noise and label, with one continuous code being varied along the rows.

In this one, the tilt in the images seems to change as we move left to right:

While in this, the thickness of the digits seems to change:

Note: In some cases, the digits have also changed while varying the continuous codes. I think that this is because there are many possible characters that the uniform codes can comply to, and its actually quite possible that they do not apply only to thickness / tilt etc, but can apply to curviness, or number of lines in a digit etc, which can make digits which look similar to each other, be generated by the same categorical code.

On CIFAR10 Dataset

NOTE: The paper does not have an implementation for the CIFAR10 dataset and hence the results aren't very good.

Results after training for 137 epochs

NOTE: In this graph blue plot corresponds to generator loss and orange to discriminator loss

Here is the loss graph

Plot of Real and Fake detection accuracies:

Here is the final image generated by the generator for a randomly generated noise and label.

In this one, the background color varies as we move left to right:

While in this, the foreground color/size varies:

It seems the continuous latent codes are working fine, but the categorical codes weren't able to represent the different classes too well, hence there is room for a lot of experiments!

Sources

InfoGAN — Generative Adversarial Networks Part III
Template on which the code was built:
DCGAN on TensorFlow tutorials

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

TensorFlow Implementation of InfoGAN

Usage

Help Log

Contributed by:

References

Summary

Introduction

InfoGAN

Implementation

Results

On MNIST Dataset

On CIFAR10 Dataset

Sources

Files

README.md

Latest commit

History

README.md

File metadata and controls

TensorFlow Implementation of InfoGAN

Usage

Help Log

Contributed by:

References

Summary

Introduction

InfoGAN

Implementation

Results

On MNIST Dataset

On CIFAR10 Dataset

Sources