Skip to content

Latest commit

 

History

History
103 lines (78 loc) · 3.38 KB

README.md

File metadata and controls

103 lines (78 loc) · 3.38 KB

Device Agnostic Deep Learning Benchmark

NGC CUDA cuDNN NVIDIA Driver Pytorch TF Contributions

This repo contains device agnostic codes of framework-wise benchmark (adapted from u93kun) and also layer-wise plus model-wise benchmark (adapted from avik-pal). Few other results were added based on my own test results.

The scripts for layer-wise and model-wise are Pytorch based, framework-wise includes Pytorch, Caffe2, and TensorFlow. Performance of CPUs and GPUs are compared, including the effect of adjusting the floating point precision (the new Volta architecture allows performance boost by utilizing half/mixed-precision calculations.)

Table of Contents

How to run

By default, it should run on GPU. It will run on CPU either when GPU is not detected, or you manually remove 'cuda:0' if torch.cuda.is_available() else from the following line

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

Layer-wise benchmark

python3 layer_benchmark.py

Tested layers:

  • Conv3x3, stride 1, padding 1
  • Conv5x5, stride 1, padding 2
  • Conv3x3, stride 2, padding 1
  • Conv5x5, stride 2, padding 2
  • Maxpool, stride 2, padding 1
  • Meanpool, stride 2, padding 1
  • Batchnorm
  • Dense

Model-wise benchmark

python3 model_benchmark.py

The following models were tested with TensorFlow:

  • vgg16
  • resnet50 *NEWLY ADDED!
  • resnet152

The following models are available to test with Pytorch:

  • vgg16
  • vgg16_bn
  • vgg19
  • vgg19_bn
  • resnet18
  • resnet34
  • resnet50
  • resnet101
  • resnet152
  • Densenet161

Framework-wise benchmark

python3 framework_benchmark.py -f <framework_name>

Available frameworks to test:

  • pytorch
  • tensorFlow (GPU only)
  • caffe2 (GPU only)

P.S.:

  • for some reason, with Zotac 1080Ti, caffe2 seems to have "out of memory" error for fp16 benchmark. It wasn't the case With GV100 and P5000 from NVIDIA. UPDATE: it turns out something was wrong about pFP16Initializer which becomes PseudoFP16Initializer.
  • Caffe2 container does not officialy supported for TITAN RTX, and since it philosophically designed with high emphasis on industrial mobile application, it makes no sense to measure it based on GPGPUs. Therefore from now on, only pytorch and tensorflow will be considered for comparison.

Docker support

The following command will create a result subdirectory (if it doesn't exist), and run all specified benchmarks by default.

./run_all_benchmark_docker.sh <device_name>

Visualized Results

The results are now visualized here.