Skip to content

Commit

Permalink
Updated the readme to be about Diogenes instead of its (now deprecate…
Browse files Browse the repository at this point in the history
…d) parent project
  • Loading branch information
bwelton committed Jul 29, 2019
1 parent 3ecba7a commit 5105453
Showing 1 changed file with 4 additions and 33 deletions.
37 changes: 4 additions & 33 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,8 @@
# cuda_deduplicator
Tool for detecting and correcting duplicate transfers in CUDA applications. This is done by wrapping the CUDA driver API function calls to intercept the data being transfered to/from device. This project supports the following features:
# Diogenes

**Detection of Duplicate Transfers**
Diogenes is a performance tool to identify unnecessary/misplaced synchronizations and memory tranfers. Diogenes provides an estimate of potential benefit of fixing the unnecessary/misplaced operation.

Collects statistical information about the presence of duplicate data transfers within an application.

- Counting of the number of data transfers containing duplicate data to/from the GPU
- Size of the duplicate transfers
- Amount of time spent transferring duplicate data (accuracy of this number is a work in progress)
- Supported transfer calls: cuMemcpyHtoDAsync_v2,cuMemcpyAtoD_v2,cuMemcpyAtoH_v2,cuMemcpyAtoHAsync_v2,cuMemcpyDtoA_v2,cuMemcpyDtoH_v2,cuMemcpyDtoHAsync_v2, and cuMemcpyHtoD_v2.

**Automatic Data Deduplication**

Functions by creating a temporary cache in GPU memory for data transfers. When a duplicate transfer is detected to the GPU device (detected via hashing), a device to device transfer from this cache will replace the host to device transfer.

- Supported transfer calls: cuMemcpyHtoDAsync_v2, cuMemcpyAtoD_v2, and cuMemcpyHtoD_v2
This project is very much a work in progress and is very much in a research state (lots of dead code, difficult to build, lots of bugs, no polish, etc).

**Building and Installation**

Expand Down Expand Up @@ -42,27 +30,10 @@ The following parameters are not required but can reduce compilation time if set
- -DDYNINST_ROOT=*Path to the installation directory of dyninst* (containing /lib and /include)
- -DPYTHON_ROOT=*Path to the installation directory of python 2.7*
- -DCMAKE_INSTALL_PREFIX=*Installation Directory*
- -DENABLE_STRACE=1 *Enables stacktrace support. when a duplicate is detected a stacktrace showing the location of the duplicate transfer will be printed*

**Usage**

For gathering statistical information on duplicate transfers (detection but no correction) run the following command from the root install directory (CMAKE_INSTALL_PREFIX directory):

1. bash ./bin/inst_app.sh */path/to/LIBCUDA.so*/libcuda.so.1 ./defs/DetectDuplicates.def.in */path/to/save/rewritten/libcuda*/libcuda.so.1 (WARNING: Do not attempt to overwrite the original libcuda.so.1)
2. export LD_LIBRARY_PATH=*/path/to/save/rewritten/libcuda*:*/CMAKE/INSTALL/PREFIX/PATH*/lib:$LD_LIBRARY_PATH
3. Run your application, statistical information printed to STDERR at end of execution.

If you are interested in automatic correction of duplicate transfers, perform the following:

1. bash ./bin/inst_app.sh */path/to/LIBCUDA.so*/libcuda.so.1 ./defs/Deduplicate.def */path/to/save/rewritten/libcuda*/libcuda.so.1 (WARNING: Do not attempt to overwrite the original libcuda.so.1)
2. export LD_LIBRARY_PATH=*/path/to/save/rewritten/libcuda*:*/CMAKE/INSTALL/PREFIX/PATH*/lib:$LD_LIBRARY_PATH
3. Run your application, statistical information (number of duplicates caught and corrected) printed to STDERR at end of execution.

**Known Issues/Works In Progress**

- ORNL Titan has issues with certain versions of dyninst, right now it is recommended not setting DYNINST_ROOT such that we can build a compatible version (git rev a8252fd). It is likely that dyninst issues impact other cray platforms, it is highly recommended you try building with git rev a8252fd of Dyninst if you run into issues.
- Deduplicate correction is highly experimental and may not show the performance benefits you may expect.
- Automated correction of duplicates may require Cuda MPS support to be enabled to function correctly (specifically in cases where multiple different threads on the same node are using CUDA in different CUDA contexts).
To be filled in....

**Disclaimer**

Expand Down

0 comments on commit 5105453

Please sign in to comment.