This readme describes how to train a Variational Autoencoder/Markov Chain Monte Carlo hybrid model on IPUs.
The Variational Autoencoder model used in these benchmarks is based on a method combining VI (Variational Inference) and MCMC (Markov Chain Monte Carlo) with a Variational Contrastive Divergence (VCD). This model is taken from the Variational Contrastive Divergence paper by Ruiz and Titsias: https://arxiv.org/abs/1905.04062, presented at ICML 2019.
Motivated by the novelty of the method and the unusual computational structure of the model, we wrote this implementation in TensorFlow to explore the performance across hardware platforms.
The model is trained with the statically binarised MNIST dataset discussed by Salakhutdinov and Murray (https://www.cs.toronto.edu/~rsalakhu/papers/dbn_ais.pdf) and released by Hugo Larochelle. The data will be downloaded automatically when the code is run for the first time.
This repo contains the code used for the experiments in the blog (https://www.graphcore.ai/posts/probabilistic-modelling-by-combining-markov-chain-monte-carlo-and-variational-inference-with-ipus), which relate to the above model. All the TensorFlow needed to reproduce the results is included.
The structure of the repo is as follows:
experiments/
containsgenerative.py
which includes the high-level experiment control flow, data-loading and preprocessingmodels/
includes the code for training and evaluating modelsutils/
has the auxiliary code, used by the classes in the above foldersmachinable/
contains some additional functionality for saving results in a standardised formconfigs/
includes the configuration files for the experiments. Some of the hyperparameters can be set via the command line, or the config files can be copied and overridden. See Options section below for more information.main.py
includes config overrides and the call to run the experimentrequirements.txt
is a list of PyPI dependencies
1) Download the Poplar SDK
Install the Poplar SDK following the instructions in the Getting Started guide for your IPU system. Make sure to source the enable.sh
script for poplar.
Create a virtualenv with python 3.6 and install the required packages:
bash
virtualenv venv -p python3.6
source venv/bin/activate
pip install -r requirements.txt
TensorFlow is not included in the requirements. If running on IPU hardware install the Graphcore tensorflow wheel from the Poplar SDK as follows:
pip install path/to/tensorflow-<version>-cp36-cp36m-linux_x86_64.whl
Otherwise install standard TensorFlow 1.14.0.
After activating your virtual environment, run as follows:
python3 main.py -r where/to/store_results/
The models defined by configs/default_config.json
and configs/global_cv_config.json
take around nine hours to train on CPU.
Use --help
to show the available hyperparameters and options. The detailed configuration is set in the files within the configs/
folder. Set the path to the desired config file using the -c
option:
python3 main.py -c path/to/config.json
The preset configs included in configs/
are:
default_config.json
which is the experimental set up used in the original paperglobal_cv_config.json
which is the same asdefault_config.json
but with a scalar (global) control variate in place of a vector of local control variatesbs_experiment_config.json
is for the experiments testing the effect of batch size. These run for 8,200 epochs rather than 800 (as in the two configs above). You'll need to pass the--batch_size
argument via the command line, and specificy a learning rate using--learning_rate
if the batch size is not one we tested (16, 128, 512, 1024). If the learning rate is not given and the batch size is one we tested, the learning rate we found to be best during validation will be used (see theTUNED_LEARNING_RATE
dict inmain.py
)test_config.json
which is similar to default config but only run for one epoch
You can also copy and extend your own config — just specify the path to your custom config via the command line.
The code will execute on IPU if an IPU is available, else it will execute on GPU if tf.test.is_gpu_available()
returns True
. If neither IPU nor GPU are available, it will run on CPU. This logic is implemented in get_device_config()
in utils/ipu_utils.py
. Tests only on IPU and fail if no IPU is available on the system.
After setting up the environment, tests can be run with:
pytest