Skip to content

v0.95

Compare
Choose a tag to compare
@bvanessen bvanessen released this 12 Feb 01:31
· 3668 commits to develop since this release

============================= Release Notes: v0.95 ==============================

Support for new training algorithms:

  • Generative Adversarial Networks (GAN)

Support for new network structures:

  • Variational Autoencoders
  • GAN
  • CycleGAN
  • Combined Autoencoders with CycleGAN
  • Deep Recurrent Attention Model (DRAM), Ba et al. (2015)
  • Video Recurrent Attention Model (VRAM)

Support for new layers:

  • Optimized Top-K accuracy (CPU, GPU)
  • Crop (CPU, GPU)
  • Sort (CPU, GPU) both ascending and descending order
  • Absolute value (CPU, GPU)
  • Mean-squared (CPU, GPU)
  • Top-K categorical accuracy (CPU, GPU)
  • Cross-entropy (CPU, GPU)
  • Stop gradient (CPU, GPU)

Performance optimizations:

  • Use Pinned memory for CPU activations matrices
  • Non-blocking GPU computation of objective functions and metrics
  • Refactored weight matrices and weight initialization
  • Manage GPU workspace buffers with memory pool
  • Slice and concatenation layer emit matrix views if possible
  • Used more fine-grained asynchronous calls when using Aluminum Library
    • Minimized GPU stream synchronization events per call
  • Improved / minimized synchronization events when using a single GPU
  • Fixed GPU workspace size
  • GPU implementation of Adagrad optimizer
  • GPU model-parallel softmax
  • Optimized local CUDA kernel implementations
  • Support for distributed matrices with arbitrary alignment

Model portability & Usability:

  • Keras to LBANN prototext conversion tool

Internals Features:

  • Support for multiple objective functions and metrics per network with arbitrary placement
    • Objective functions represented as layers
    • Metrics represented as layers
    • Introduced evaluation layer construct
  • Ability to freeze specific layers for pre-training / fine-tuning
  • Refactoring tensor setup in setup, forward prop, and back prop
  • Layers store matrices in private smart pointers
  • Model automatically inserts evaluation layers where needed
  • Copy Layer activations between models
  • Annotated GPU profiling output with training phases
  • Fixed initialization of Comm object and Grid objects when using multiple models
  • General code cleanup, refactoring, and various bug fixes.
  • All layers overwrite error signal matrices
  • NCCL backend is now implemented via Aluminum Library
  • MPI calls are routed through the LBANN Comm object into Hydrogen or Aluminum
  • Provide runtime statistics summary from every rank
  • Reworked LBANN to use Hydrogen to manage GPU memory
  • GPU allocations now via CUB memory pool
  • Fixed Spack build interaction with Hydrogen Library

I/O & data readers:

  • Support for Conduit objects with HDF5 formatting
  • In-memory and locally offloaded data store
    • Data Store can hold the entire training set in memory (or node-local storage)
    • Data store will shuffle data samples between epochs and present samples to input layer
  • Updated synthetic data reader
  • Modified data readers to handle bad samples in JAG conduit data
  • Reworked the I/O layers (input and target) so that the input layer produces both the
    sample and label / response if necessary.
    • Target layer is being deprecated
  • Updated image data reader to use cv::imdecode to accelerate image load times
  • Allow users to specify an array of data sources for the independent/dependent
    variables via prototext