Merge branch 'release-v0.94' · LLNL/lbann@ca499c9

Commit

Merge branch 'release-v0.94'

v0.94

Support for new training algorithms:
  - Back-Propagation Through Time (BPTT)
    -- Recurrent Neural Networks (RNN)
    -- Long Short-Term Memories (LSTM)
  - Generative Adversarial Networks (GAN)
  - Variational autoencoders
  - Convolutional autoencoders
  - Fine tuning of pretrained networks
    -- Flexible weight freezing
  - Context-prediction network (Siamese network)
  - Livermore Tournament Fast Batch learning (LTFB)
  - Variable mini-batch sizes

Support for new network structures
  - Directed Acyclic Graph (DAG) networks
  - Residual networks
  - Modular and composable objective functions
  - Multiple metrics
  - Shared weight matrices
  - (BETA) New evaluation layer that is attach to any point of DAG
  - Motifs (compound, reused network patterns)

Support for new layers:
  - Learning:
    - Deconvolution
  - Metrics:
    -- Top K Categorical accuracy, Pearson correlation, Mean absolute deviation
  - Loss Functions:
    -- Cross Entropy with Uncertainty, Geometric negative log likelihood
    -- Poisson Negative log likelihood, Polya Negative Log Likelihood
  - Optimizers:
    -- Hypergradient Adam
  - Transform Layers:
    -- Contatenation, Noise, Unpooling, Pooling, Reshape, Slice, Split, Sum
  - Regularizer:
    -- Batch Normalization, Selu Dropout, Local Response Normalization (LRN)
  - Activations:
    -- Leaky Relu, Smooth Relu, Elu, Scaled Elu, Softplus, Atan,
    -- Bent Identity, Exponential

Performance optimizations:
  - GPU acceleration for most layers
  - NCCL 2.X
  - Optimized communication patterns
  - Asynchronous weight updates
  - Asynchronous metric and objective function updates
  - batch normalization (global and local)
  - L2 normalization
  - Adaptive Quantization (inter-model)

Model portablility & usability:
  - Portable checkpoints / recovery
  - Distributed checkpoint / recovery
  - Network visualization
  - Export LBANN to TensorFlow format

Internals Features:
  - Gradient checking
  - Network representation using tensor dimensions
  - Bamboo continuous integration (CI)
  - Improved data processing pipeline

New data readers:
 - Numpy
 - CSV
 - Methods for merging multiple features and samples across files
 - CANDLE Pilot 2
 - CANDLE Pilot 1 Combo
 - ICF JAG

Integration with Hydrogen, an optimized distributed, dense linear algebra
library.  Hydrogen is a fork of the Elemental library.  Hydrogen optimizes for:
distributed matrices with elemental and block distributions, BLAS, LAPACK,
distributed and local matrix management.

Integration with optimized all-reduce communication library Aluminum.  Aluminum
provides custom reduction patterns, customized CUDA reduction kernels,
and asynchronous communication operators. It uses MPI, MPI w/GPUdirect, or NCCL
as back-end libraries. Aluminum enables us to effectively use non-blocking
all-reduces during backprop/optimization

Addtionally, we have added support for an online, distributed data store.  When
enabled, LBANN is able to ingest all of the training data set in a distributed
method across all ranks.  Each data store is then able to serve it's portion of
a mini-batch, dynamically moving data to the necessary ranks in the model (based
on the mini-batch data distribution).

Loading branch information

bvanessen committed May 11, 2018

2 parents f5878cf + 29227cd commit ca499c9

.gitignore

-Original file line number
+Diff line change
@@ Expand Up / @@ -5,5 +5,11 @@ @@
     *~
     *.o
+    data.prototext*
+    *.bak
+    *.pyc
+    *.mc
     # Can also ignore all directories and files in a directory.
     # tmp/**/*
+    build

0 comments on commit `ca499c9`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `ca499c9`

Commit

There are no files selected for viewing

0 comments on commit ca499c9

0 comments on commit `ca499c9`