Skip to content

Commit

Permalink
Merge branch 'release-v0.99'
Browse files Browse the repository at this point in the history
============================== Release Notes: v0.99 ==============================
Support for new training algorithms:
 - Improvements to LTFB infrastructure (including transfer of SGD and Adam hyperparameters)

Support for new network structures:
 - Support for Wide ResNets

Support for new layers:

Python front-end:
 - Python front-end for generating neural network architectures (lbann namespace):
   including layers, objective functions, callbacks, metrics, and optimizers.
 - Python interface for launching (SLURM or LSF) jobs on HPC systems
 - Support for running LBANN experiments and capturing experimental output
 - Network templates for AlexNet, LeNet, arbitrary ResNet models, and Wide ResNet models
 - Python scripts for LeNet, AlexNet, and (Wide) ResNets in model zoo.

Performance optimizations:
 - GPU implementation of RMSprop optimizer.
 - cuDNN convolution algorithms are determined by empirically measuring
   performance rather than using heuristics.
 - Avoid setting up unused bias weights.
 - Perform gradient accumulations in-place when possible.

Model portability & usability:

Internal features:
 - Weight gradient allreduces are in-place rather than on a staging buffer.
 - Fully connected and convolution layers only create bias weights when
   needed.
 - Optimizer exposes gradient buffers so they can be updated in-place.
 - Added callback support to explicitly save model
 - Min-max metric for reporting on multiple LTFB trainers
 - Cleanup of Hydrogen interface to match Hydrogen v1.2.0
 - Added type-erased matrix class for internal refactoring
 - Make CUB always log performance critical events

I/O & data readers:
 - Python data reader that interacts with an embedded Python session.
 - Optimized data store to provide preload option
 - Extended data store to operate with Cosmoflow-numpy data reader

Build system:
 - Added documentation for how users can use Spack to install LBANN
   either directly or via environments.
 - Conduit is a required dependency.
 - Provided Spack environment for installing LBANN as a user
 - Improved documentation on lbann.readthedocs.io
 - CMake installs a module file in the installation directory that
   sets up PATH and PYTHONPATH variables appropriately

Bug fixes:
 - Models can now be copied or setup multiple times.
 - Fixed incorrect weight initialization with multiple trainers.
 - Updated I/O random number generators to be C++ thread safe (rather than OpenMP)
 - Added an I/O random number generator for preprocessing that is independent
   of the data sequence RNG.
 - Fixed initialization order of RNGs and multiple models / trainers.
 - General fixes for I/O and LTFB interaction.

Retired features:
 - "Zero" layer (hack for early GAN implementation).
 - Removed data reader specific implementations of data store (in favor of Conduit-based
   data store)
  • Loading branch information
bvanessen committed May 15, 2019
2 parents 321c436 + a8e0635 commit 018018b
Show file tree
Hide file tree
Showing 850 changed files with 25,225 additions and 18,065 deletions.
7 changes: 7 additions & 0 deletions .readthedocs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# .readthedocs.yml

build:
image: latest

python:
version: 3.7
176 changes: 157 additions & 19 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
cmake_minimum_required(VERSION 3.8)
cmake_minimum_required(VERSION 3.12)

project(LBANN CXX)

Expand Down Expand Up @@ -48,8 +48,8 @@ endif ()
#

set(LBANN_VERSION_MAJOR 0)
set(LBANN_VERSION_MINOR 98)
set(LBANN_VERSION_PATCH 1)
set(LBANN_VERSION_MINOR 99)
set(LBANN_VERSION_PATCH 0)

set(LBANN_VERSION "${LBANN_VERSION_MAJOR}.${LBANN_VERSION_MINOR}.${LBANN_VERSION_PATCH}")

Expand Down Expand Up @@ -100,7 +100,7 @@ option(LBANN_WITH_ALUMINUM "Enable Aluminum all-reduce library" OFF)

option(LBANN_WITH_CNPY "Include cnpy" ON)

option(LBANN_WITH_CONDUIT "Enable Conduit library" OFF)
option(LBANN_WITH_CONDUIT "Enable Conduit library" ON)

option(LBANN_WITH_CUDNN "Include Nvidia cuDNN" ON)

Expand All @@ -110,12 +110,17 @@ option(LBANN_WITH_HWLOC
option(LBANN_WITH_NVPROF
"Enable NVTX-based instrumentation for nvprof" OFF)

option(LBANN_WITH_TBINF "Include Tensorboard interface" ON)
option(LBANN_WITH_PYTHON
"Install Python frontend and enable embedded Python" ON)

option(LBANN_WITH_TBINF "Include Tensorboard interface" ON)

option(LBANN_WITH_VTUNE
"Link the Intel VTune profiling library" OFF)

option(LBANN_WITH_UNIT_TESTING
"Enable the unit testing framework (requires Catch2)" OFF)

# Enable parallel random matrix generation, if possible
option(LBANN_DETERMINISTIC
"Use deterministic algorithms as much as possible." OFF)
Expand Down Expand Up @@ -167,12 +172,12 @@ set(LBANN_HAS_CEREAL ${CEREAL_FOUND})
# The imported target is just called "cereal". Super.

# Setup the linear algebra library
find_package(Hydrogen 1.1.0 NO_MODULE QUIET
find_package(Hydrogen 1.2.0 NO_MODULE QUIET
HINTS ${Hydrogen_DIR} ${HYDROGEN_DIR} $ENV{Hydrogen_DIR} $ENV{HYDROGEN_DIR}
PATH_SUFFIXES lib/cmake/hydrogen
NO_DEFAULT_PATH)
if (NOT Hydrogen_FOUND)
find_package(Hydrogen 1.1.0 NO_MODULE QUIET REQUIRED)
find_package(Hydrogen 1.2.0 NO_MODULE QUIET REQUIRED)
endif ()
message(STATUS "Found Hydrogen: ${Hydrogen_DIR}")
set(LBANN_HAS_HYDROGEN ${Hydrogen_FOUND})
Expand Down Expand Up @@ -209,13 +214,13 @@ endif ()
if (LBANN_WITH_ALUMINUM)
# Aluminum may have already been found by Hydrogen
if (NOT Aluminum_FOUND)
find_package(Aluminum NO_MODULE QUIET
find_package(Aluminum 0.2.0 NO_MODULE QUIET
HINTS ${Aluminum_DIR} ${ALUMINUM_DIR} ${AL_DIR}
$ENV{Aluminum_DIR} $ENV{ALUMINUM_DIR} $ENV{AL_DIR}
PATH_SUFFIXES lib64/cmake/aluminum lib/cmake/aluminum
NO_DEFAULT_PATH)
if (NOT Aluminum_FOUND)
find_package(Aluminum NO_MODULE QUIET)
find_package(Aluminum 0.2.0 NO_MODULE QUIET)
endif ()
endif ()
set(LBANN_HAS_ALUMINUM ${Aluminum_FOUND})
Expand Down Expand Up @@ -287,6 +292,29 @@ if (LBANN_WITH_TBINF)
add_subdirectory(external/TBinf)
endif ()

# Find Python
# Note: This uses the Python module in cmake/modules, not the module
# that comes included with CMake. See the file for a discussion of the
# differences.
if (LBANN_WITH_PYTHON)
find_package(Python REQUIRED)
set(LBANN_HAS_PYTHON "${Python_FOUND}")
if (NOT Python_VERSION_MAJOR EQUAL 3)
set(LBANN_HAS_PYTHON FALSE)
message(FATAL_ERROR "Python 2 is not supported.")
endif ()

# Setup the installation stuff
set(PYTHON_INSTALL_PREFIX "${CMAKE_INSTALL_PREFIX}"
CACHE PATH "The prefix for the python installation")

set(CMAKE_INSTALL_PYTHONDIR
"lib/python${Python_VERSION_MAJOR}.${Python_VERSION_MINOR}/site-packages"
CACHE PATH
"Relative path from PYTHON_INSTALL_PREFIX to the python package install")

endif (LBANN_WITH_PYTHON)

if (LBANN_WITH_VTUNE)
find_package(VTune MODULE)

Expand All @@ -305,7 +333,7 @@ if (LBANN_WITH_VTUNE)
endif (VTune_FOUND)
endif (LBANN_WITH_VTUNE)

if (LBANN_WITH_NVPROF)
if (LBANN_WITH_CUDA AND LBANN_WITH_NVPROF)
set(LBANN_NVPROF TRUE)
endif ()

Expand Down Expand Up @@ -336,15 +364,15 @@ if (LBANN_WITH_CONDUIT)
message(STATUS "Found HDF5: ${HDF5_DIR}")
endif ()

find_package(CONDUIT CONFIG QUIET
HINTS ${CONDUIT_DIR} $ENV{CONDUIT_DIR}
find_package(Conduit CONFIG QUIET
HINTS ${Conduit_DIR} $ENV{Conduit_DIR} ${CONDUIT_DIR} $ENV{CONDUIT_DIR}
PATH_SUFFIXES lib64/cmake lib/cmake
NO_DEFAULT_PATH)
if (NOT CONDUIT_FOUND)
find_package(CONDUIT CONFIG QUIET REQUIRED
if (NOT Conduit_FOUND)
find_package(Conduit CONFIG QUIET REQUIRED
PATH_SUFFIXES lib64/cmake lib/cmake)
endif ()
message(STATUS "Found CONDUIT: ${CONDUIT_DIR}")
message(STATUS "Found CONDUIT: ${Conduit_DIR}")

# Ugh. I don't like that this requires intimate knowledge of
# specific targets that CONDUIT exports. It should support
Expand Down Expand Up @@ -402,9 +430,28 @@ if (LBANN_WITH_CONDUIT)
"${_conduit_interface_link_libs}")

set(CONDUIT_LIBRARIES conduit::conduit)
set(LBANN_HAS_CONDUIT ${CONDUIT_FOUND})
set(LBANN_HAS_CONDUIT ${Conduit_FOUND})
endif (LBANN_WITH_CONDUIT)

if (LBANN_WITH_UNIT_TESTING)
find_package(Catch2 2.0.0 CONFIG QUIET
HINTS ${CATCH2_DIR} $ENV{CATCH2_DIR} ${CATCH_DIR} $ENV{CATCH_DIR}
PATH_SUFFIXES lib64/cmake/Catch2 lib/cmake/Catch2
NO_DEFAULT_PATH)
if (NOT Catch2_FOUND)
find_package(Catch2 2.0.0 CONFIG QUIET REQUIRED)
endif ()
message(STATUS "Found Catch2: ${Catch2_DIR}")

# Now that Catch2 has been found, start adding the unit tests
include(CTest)
include(Catch)
add_subdirectory(src/utils/unit_test)

# Add this one last
add_subdirectory(unit_test)
endif (LBANN_WITH_UNIT_TESTING)

# Handle the documentation
add_subdirectory(docs)

Expand All @@ -430,6 +477,10 @@ target_include_directories(lbann PUBLIC
$<BUILD_INTERFACE:${CMAKE_SOURCE_DIR}/include>
$<INSTALL_INTERFACE:${CMAKE_INSTALL_PREFIX}/${CMAKE_INSTALL_INCLUDEDIR}>)

if (LBANN_HAS_PYTHON)
target_include_directories(lbann PUBLIC ${Python_INCLUDE_DIRS})
endif ()

# Use the IMPORTED targets when possible.
target_link_libraries(lbann PUBLIC LbannProto)
target_link_libraries(lbann PUBLIC cereal)
Expand Down Expand Up @@ -460,6 +511,10 @@ if (LBANN_HAS_VTUNE)
target_link_libraries(lbann PUBLIC ${VTUNE_STATIC_LIB})
endif ()

if (LBANN_HAS_PYTHON)
target_link_libraries(lbann PUBLIC ${Python_LIBRARIES})
endif ()

if (TARGET LBANN_CXX_FLAGS_werror)
target_link_libraries(lbann PUBLIC LBANN_CXX_FLAGS_werror)
endif ()
Expand Down Expand Up @@ -516,8 +571,8 @@ export(EXPORT LBANNTargets NAMESPACE LBANN:: FILE LBANNTargets.cmake)

# Write the configure file for the install tree
set(INCLUDE_INSTALL_DIRS include)
set(LIB_INSTALL_DIR lib)
set(CMAKE_INSTALL_DIR lib/cmake/lbann)
set(LIB_INSTALL_DIR ${CMAKE_INSTALL_LIBDIR})
set(CMAKE_INSTALL_DIR ${LIB_INSTALL_DIR}/cmake/lbann)
set(EXTRA_CMAKE_MODULE_DIR)
configure_package_config_file(cmake/configure_files/LBANNConfig.cmake.in
"${CMAKE_BINARY_DIR}/LBANNConfig.cmake.install"
Expand Down Expand Up @@ -559,6 +614,64 @@ install(
FILES "${PROJECT_BINARY_DIR}/lbann_config.hpp"
DESTINATION "${CMAKE_INSTALL_INCLUDEDIR}")

# Install Python frontend
# Note (tym): Python best practices are to put setup.py at the package
# root and setuptools only accepts relative paths. However, we need to
# insert a config file containing install-specific file paths and make
# sure setup.py can pick it up. I see three approaches for the build
# process:
# 1) Inject the config file into a known location in the source
# directory so that setup.py can pick it up.
# 2) Copy the Python source tree into the build directory and insert
# setup.py and the config file.
# 3) Create setup.py and the config file in the build directory and
# pass the source directory as a relative path.
# We go for option 3 since it's simple and lightweight, but it runs
# counter to the intent of setuptools. If we learn about any nicer
# approaches, we should use them.
if (LBANN_HAS_PYTHON)

# Construct config file
# NOTE (trb): python_config.ini is installed by setup.py
set(_PYTHON_CONFIG_INI ${CMAKE_BINARY_DIR}/python_config.ini)
set(_LBANN_PB2_PY ${PYTHON_INSTALL_PREFIX}/${CMAKE_INSTALL_PYTHONDIR}/lbann_pb2.py)
set(_LBANN_EXE ${CMAKE_INSTALL_PREFIX}/${CMAKE_INSTALL_BINDIR}/lbann)
configure_file(
"${CMAKE_SOURCE_DIR}/cmake/configure_files/python_config.ini.in"
"${_PYTHON_CONFIG_INI}"
@ONLY)

# Construct setup.py
set(_SETUP_PY ${CMAKE_BINARY_DIR}/setup.py)
set(_LBANN_PYTHON_DIR "${CMAKE_SOURCE_DIR}/python")
configure_file(
"${CMAKE_SOURCE_DIR}/cmake/configure_files/setup.py.in"
"${_SETUP_PY}"
@ONLY)

# Install Python package with setuptools
set(_PY_INSTALL_DIR "${PYTHON_INSTALL_PREFIX}/${CMAKE_INSTALL_PYTHONDIR}")
set(_SETUP_PY_ARGS
"${_SETUP_PY_ARGS} --root ${_PY_INSTALL_DIR} --install-lib . --install-data .")
install(CODE
"execute_process(COMMAND ${Python_EXECUTABLE} ${_SETUP_PY} install ${_SETUP_PY_ARGS})")

set(_PY_INSTALL_MSG
"
\n**********************************************************************
A Python package has been installed to ${_PY_INSTALL_DIR}. To use
this package, be sure to add this directory to your PYTHONPATH, e.g.:
export PYTHONPATH=${_PY_INSTALL_DIR}:\\$\{PYTHONPATH\}
**********************************************************************\n
")
install(CODE
"execute_process(COMMAND ${CMAKE_COMMAND} -E echo \"${_PY_INSTALL_MSG}\")")

endif (LBANN_HAS_PYTHON)

# Install contributor list, license, readme
install(
FILES "${PROJECT_SOURCE_DIR}/CONTRIBUTORS"
Expand All @@ -583,8 +696,10 @@ macro(append_str_tf STRING_VAR)
math(EXPR _num_spaces "${_max_length} - ${_var_length}")
lbann_get_space_string(_spaces ${_num_spaces})
if (${var})
set(${var} "TRUE")
string(APPEND ${STRING_VAR} " ${var}:" "${_spaces}" "TRUE\n")
else ()
set(${var} "FALSE")
string(APPEND ${STRING_VAR} " ${var}:" "${_spaces}" "FALSE\n")
endif ()
endforeach()
Expand Down Expand Up @@ -632,10 +747,33 @@ append_str_tf(_str
LBANN_HAS_DOXYGEN
LBANN_HAS_LBANN_PROTO
LBANN_HAS_ALUMINUM
LBANN_HAS_CONDUIT)
LBANN_HAS_CONDUIT
LBANN_HAS_PYTHON)
string(APPEND _str
"\n== End LBANN Configuration Summary ==\n")

# Output to stdout
execute_process(COMMAND ${CMAKE_COMMAND} -E echo "${_str}")
set(_str)

#
# Write a basic modulefile
#
set(LBANN_MODULEFILE_NAME "lbann-${LBANN_VERSION}.lua"
CACHE STRING
"The name of the LBANN modulefile to install. Must end in .lua.")

if (NOT (LBANN_MODULEFILE_NAME MATCHES ".+\.lua"))
message(WARNING
"LBANN_MODULEFILE_NAME must have extension \".lua\". Appending.")
set(LBANN_MODULEFILE_NAME "${LBANN_MODULEFILE_NAME}.lua"
CACHE STRING "" FORCE)
endif ()

configure_file(
"${CMAKE_SOURCE_DIR}/cmake/configure_files/lbann_module.lua.in"
"${CMAKE_BINARY_DIR}/lbann_module.lua.install"
@ONLY)
install(FILES "${CMAKE_BINARY_DIR}/lbann_module.lua.install"
RENAME "${LBANN_MODULEFILE_NAME}"
DESTINATION "${CMAKE_INSTALL_SYSCONFDIR}/modulefiles")
7 changes: 3 additions & 4 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Copyright (c) 2014-2016, Lawrence Livermore National Security, LLC.
Produced at the Lawrence Livermore National Laboratory.
Copyright (c) 2014-2019, Lawrence Livermore National Security, LLC.
Produced at the Lawrence Livermore National Laboratory.
Written by the LBANN Research Team (B. Van Essen, et al.) listed in
the CONTRIBUTORS file. <[email protected]>

Expand All @@ -8,7 +8,7 @@ All rights reserved.

This file is part of LBANN: Livermore Big Artificial Neural Network
Toolkit. For details, see http://software.llnl.gov/LBANN or
https://github.com/LLNL/LBANN.
https://github.com/LLNL/LBANN.

Licensed under the Apache License, Version 2.0 (the "Licensee"); you
may not use this file except in compliance with the License. You may
Expand All @@ -21,4 +21,3 @@ distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied. See the License for the specific language governing
permissions and limitations under the license.

18 changes: 15 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,17 @@ methods.


## Building LBANN
A few options for building LBANN are documented
[here](docs/BuildingLBANN.md#top).
The preferred method for LBANN users to install LBANN is to use
[Spack](https://github.com/llnl/spack). After some system
configuration, this should be as straightforward as

```bash
spack install lbann
```

More detailed instructions for building and installing LBANN are
available at the [main LBANN
documentation](https://lbann.readthedocs.io/en/latest/index.html).

## Running LBANN
The basic template for running LBANN is
Expand All @@ -42,8 +50,12 @@ optimized for the case in which one assigns one GPU per MPI
the MPI launcher.

More details about running LBANN are documented
[here](docs/RunningLBANN.md#top).
[here](https://lbann.readthedocs.io/en/latest/running_lbann.html).

## Publications

A list of publications, presentations and posters are shown
[here](https://lbann.readthedocs.io/en/latest/publications.html).

## Reporting issues
Issues, questions, and bugs can be raised on the [Github issue
Expand Down
Loading

0 comments on commit 018018b

Please sign in to comment.