Skip to content

Commit

Permalink
Merge branch 'release-v0.101'
Browse files Browse the repository at this point in the history
============================== Release Notes: v0.101 ==============================

Support for new training algorithms:

Support for new network structures:
 - ATOM VAE model
 - Graph neural networks
 - Graph Convolutional Networks (GCN)
 - 3D U-Net Model

Support for new layers:
 - Implemented optimized GRU layer using cuDNN kernel
 - Graph Layers: GCN, GIN, Graph, GatedGraph

Python front-end:
 - Support for Graph and Graph Convolutional Networks
 - Added support for OCLF data center (Summit)

Performance optimizations:
 - Optimize CUDA kernel for tensor reordering in GRU layer
 - Enabled TensorCore optimization for GRU layer
 - GCN and Graph layers also have a faster Dense variant which only utilizes Matrix Multiplication

Model portability & usability:
 - Added Users Quickstart section to documentation including PyTorch
   to LBANN mini-tutorial
 - Added section on callbacks with detailed instructions on summarize
   images callback

Internal features:
 - Support for double data type in distributed embedding layer
 - Support for large number of channels in GPU batchnorm layer
 - Modified LTFB so that NaNs lose tournaments
 - Improved numerical stability of reconstruction loss in ATOM VAE
   model
 - Skip bad gradients in Adam

I/O & data readers:
 - Added support for ImageNet data reader to use sample lists
 - Refactored sample list code to be more flexible and generalize
   beyond JAG data reader
 - Added support for slab-based I/O in HDF5 data reader required by
   DistConv implementations of CosmoFlow 3D volumes
 - Extended slab-based HDF5 data reader to support labels and
   reconstruction modes for use with U-Net architecture

Datasets:
 - Added two graph datasets (MNIST, and PROTEINS)

Build system and Dependent Libraries:
 - Hydrogen 1.4.0
 - Aluminum 0.4.0
 - Spack v0.15.4+ (Requires new format for environments)
 - cuDNN 8.0.2
 - Require C++14
 - Added Spack build support for OCLF data center (Summit)

Bug fixes:
 - Properly reset data coordinator after each LTFB round
 - Fixed bug in weights proxy when weights buffer is reallocated
 - Bugfix for smiles data reader bound checking and simple LTFB data
   distribution
 - Eliminated a race condition observed in VAE ATOM model with SMILES
   data reader.  Added a barrier after each data store mini-batch
   exchange -- avoid race between non-blocking sends and receives and
   later GPU kernel communication.

Retired features:
  • Loading branch information
bvanessen committed Sep 29, 2020
2 parents d0fbac3 + 6a0f8bf commit 13b5167
Show file tree
Hide file tree
Showing 211 changed files with 9,989 additions and 1,268 deletions.
20 changes: 12 additions & 8 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,9 @@ if (NOT DEFINED BUILD_SHARED_LIBS)
set(BUILD_SHARED_LIBS ON)
endif ()

# Build with at least C++11 standard; allow newer standards.
# Build with at least C++14 standard; allow newer standards.
if (NOT CMAKE_CXX_STANDARD OR CMAKE_CXX_STANDARD EQUAL 98)
set(CMAKE_CXX_STANDARD 11)
set(CMAKE_CXX_STANDARD 14)
set(CMAKE_CXX_STANDARD_REQUIRED TRUE)
endif ()

Expand All @@ -48,7 +48,7 @@ endif ()
#

set(LBANN_VERSION_MAJOR 0)
set(LBANN_VERSION_MINOR 100)
set(LBANN_VERSION_MINOR 101)
set(LBANN_VERSION_PATCH 0)

set(LBANN_VERSION "${LBANN_VERSION_MAJOR}.${LBANN_VERSION_MINOR}.${LBANN_VERSION_PATCH}")
Expand Down Expand Up @@ -188,16 +188,20 @@ set(LBANN_HAS_CEREAL ${CEREAL_FOUND})
# The imported target is just called "cereal". Super.

# Setup the linear algebra library
find_package(Hydrogen 1.3.3 NO_MODULE QUIET
find_package(Hydrogen 1.4.0 NO_MODULE QUIET
HINTS ${Hydrogen_DIR} ${HYDROGEN_DIR} $ENV{Hydrogen_DIR} $ENV{HYDROGEN_DIR}
PATH_SUFFIXES lib/cmake/hydrogen
NO_DEFAULT_PATH)
if (NOT Hydrogen_FOUND)
find_package(Hydrogen 1.3.3 NO_MODULE QUIET REQUIRED)
find_package(Hydrogen 1.4.0 NO_MODULE QUIET REQUIRED)
endif ()
message(STATUS "Found Hydrogen: ${Hydrogen_DIR}")
set(LBANN_HAS_HYDROGEN ${Hydrogen_FOUND})

if (_HYDROGEN_HAVE_ROCM)
message(FATAL_ERROR "ROCm not yet supported in LBANN.")
endif ()

# DiHydrogen and Distconv
if (LBANN_WITH_DISTCONV AND NOT LBANN_WITH_DIHYDROGEN)
message(FATAL_ERROR "Distconv requires DiHydrogen. Enable DiHydrogen to use Distconv.")
Expand Down Expand Up @@ -260,7 +264,7 @@ if (LBANN_HAS_CUDA)
enable_language(CUDA)

if (NOT CMAKE_CUDA_STANDARD OR CMAKE_CUDA_STANDARD EQUAL 98)
set(CMAKE_CUDA_STANDARD 11)
set(CMAKE_CUDA_STANDARD 14)
endif ()

set(CMAKE_CUDA_STANDARD_REQUIRED TRUE)
Expand All @@ -271,13 +275,13 @@ if (LBANN_WITH_ALUMINUM)
if (NOT Aluminum_FOUND)
message(WARNING
"Using Aluminum without Hydrogen support may not be well-supported.")
find_package(Aluminum 0.3.0 NO_MODULE QUIET
find_package(Aluminum 0.4.0 NO_MODULE QUIET
HINTS ${Aluminum_DIR} ${ALUMINUM_DIR} ${AL_DIR}
$ENV{Aluminum_DIR} $ENV{ALUMINUM_DIR} $ENV{AL_DIR}
PATH_SUFFIXES lib64/cmake/aluminum lib/cmake/aluminum
NO_DEFAULT_PATH)
if (NOT Aluminum_FOUND)
find_package(Aluminum 0.3.0 NO_MODULE QUIET)
find_package(Aluminum 0.4.0 NO_MODULE QUIET)
endif ()
endif ()
set(LBANN_HAS_ALUMINUM ${Aluminum_FOUND})
Expand Down
69 changes: 69 additions & 0 deletions ReleaseNotes.txt
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,75 @@ Bug fixes:

Retired features:

============================== Release Notes: v0.101 ==============================

Support for new training algorithms:

Support for new network structures:
- ATOM VAE model
- Graph neural networks
- Graph Convolutional Networks (GCN)
- 3D U-Net Model

Support for new layers:
- Implemented optimized GRU layer using cuDNN kernel
- Graph Layers: GCN, GIN, Graph, GatedGraph

Python front-end:
- Support for Graph and Graph Convolutional Networks
- Added support for OCLF data center (Summit)

Performance optimizations:
- Optimize CUDA kernel for tensor reordering in GRU layer
- Enabled TensorCore optimization for GRU layer
- GCN and Graph layers also have a faster Dense variant which only utilizes Matrix Multiplication

Model portability & usability:
- Added Users Quickstart section to documentation including PyTorch
to LBANN mini-tutorial
- Added section on callbacks with detailed instructions on summarize
images callback

Internal features:
- Support for double data type in distributed embedding layer
- Support for large number of channels in GPU batchnorm layer
- Modified LTFB so that NaNs lose tournaments
- Improved numerical stability of reconstruction loss in ATOM VAE
model
- Skip bad gradients in Adam

I/O & data readers:
- Added support for ImageNet data reader to use sample lists
- Refactored sample list code to be more flexible and generalize
beyond JAG data reader
- Added support for slab-based I/O in HDF5 data reader required by
DistConv implementations of CosmoFlow 3D volumes
- Extended slab-based HDF5 data reader to support labels and
reconstruction modes for use with U-Net architecture

Datasets:
- Added two graph datasets (MNIST, and PROTEINS)

Build system and Dependent Libraries:
- Hydrogen 1.4.0
- Aluminum 0.4.0
- Spack v0.15.4+ (Requires new format for environments)
- cuDNN 8.0.2
- Require C++14
- Added Spack build support for OCLF data center (Summit)

Bug fixes:
- Properly reset data coordinator after each LTFB round
- Fixed bug in weights proxy when weights buffer is reallocated
- Bugfix for smiles data reader bound checking and simple LTFB data
distribution
- Eliminated a race condition observed in VAE ATOM model with SMILES
data reader. Added a barrier after each data store mini-batch
exchange -- avoid race between non-blocking sends and receives and
later GPU kernel communication.

Retired features:

============================== Release Notes: v0.100 ==============================
Support for new network structures:
- 3D molecular generation models for Metal Organic Frameworks from the CoRE MOF Database.
Expand Down
Loading

0 comments on commit 13b5167

Please sign in to comment.