Skip to content

Latest commit

 

History

History
90 lines (79 loc) · 5.04 KB

config-v1.md

File metadata and controls

90 lines (79 loc) · 5.04 KB

PyTorch Benchmark Score V1

This file describes how we generate the PyTorch Benchmark Score Version 1. The goal is to help users and developers understand the score and be able to reproduce it.

V1 uses the same hardware environment as V0, but it covers far more models and test configurations.

Requirements

The V1 benchmark suite uses an experimental JIT feature, optimize_for_inference, introduced on May 22, 2021. Therefore, it can't run on earlier versions of PyTorch.

Coverage

The V1 suite covers 50 models from popular machine learning domains. The complete list of models is as follows:

Model name Category
BERT_pytorch NLP
Background_Matting COMPUTER VISION
LearningToPaint REINFORCEMENT LEARNING
alexnet COMPUTER VISION
attention_is_all_you_need_pytorch NLP
demucs OTHER
densenet121 COMPUTER VISION
dlrm RECOMMENDATION
drq REINFORCEMENT LEARNING
fastNLP NLP
hf_Albert NLP
hf_Bert NLP
hf_BigBird NLP
hf_DistilBert NLP
hf_GPT2 NLP
hf_Longformer NLP
hf_T5 NLP
maml OTHER
maml_omniglot OTHER
mnasnet1_0 COMPUTER VISION
mobilenet_v2 COMPUTER VISION
mobilenet_v3_large COMPUTER VISION
moco OTHER
nvidia_deeprecommender RECOMMENDATION
opacus_cifar10 OTHER
pyhpc_equation_of_state OTHER
pyhpc_isoneutral_mixing OTHER
pytorch_CycleGAN_and_pix2pix COMPUTER VISION
pytorch_stargan COMPUTER VISION
pytorch_struct OTHER
resnet18 COMPUTER VISION
resnet50 COMPUTER VISION
resnet50_quantized_qat COMPUTER VISION
resnext50_32x4d COMPUTER VISION
shufflenet_v2_x1_0 COMPUTER VISION
soft_actor_critic REINFORCEMENT LEAERNING
speech_transformer SPEECH
squeezenet1_1 COMPUTER VISION
timm_efficientnet COMPUTER VISION
timm_nfnet COMPUTER VISION
timm_regnet COMPUTER VISION
timm_resnest COMPUTER VISION
timm_vision_transformer COMPUTER VISION
timm_vovnet COMPUTER VISION
tts_angular SPEECH
vgg16 COMPUTER VISION
yolov3 COMPUTER VISION

Reference Config YAML

The reference config YAML file is stored here. It is generated by repeated runs of the same benchmark setting on pytorch v1.10.0.dev20210612, torchtext 0.10.0.dev20210612, and torchvision 0.11.0.dev20210612. We choose the earliest PyTorch nightly version that has a stable implementation of the optimize_for_inference feature. We then picked a random execution of the repeated V1 benchmark runs as the reference execution, and summarize its performance metrics in the reference config YAML.

We have also manually verified that the maximum variance of any single test in the V1 suite is smaller than 7%. In the V1 nightly CI job, we raise signal if any tests performance metric changes over the 7% threshold, or the overall score number changes over 1% threshold.

We define the V1 score value of the referenece execution to be 1000. All other V1 scores are relative to the performance of the reference execution. For example, if another V1 benchmark execution's score is 900, it means the its performance is 10% slower comparing to the reference execution.