Release Notes 3.5
Version Compatibility
Vitis™ AI v3.5 and the DPU IP released with the v3.5 branch of this repository are verified as compatible with Vitis, Vivado™, and PetaLinux version 2023.1. If you are using a previous release of Vitis AI, you should review the version compatibility matrix for that release.
Documentation and Github Repository
- Merged UG1313 into UG1414
- Streamlined UG1414 to remove redundant content
- Streamlined UG1414 to focus exclusively on core tool usage. Core tools such as the Optimizer, Quantizer and Compiler are now being utilized across multiple targets (ie Ryzen™ AI, EPYC™) and this change seeks to make UG1414 more portable to these targets
- Migrated Adaptable SoC and Alveo specific content from UG1414 to Github.IO
- New Github.IO Toctree structure
- Integrated VART Runtime APIs in Doxygen format
Docker Containers and GPU Support
- Removed Anaconda dependency from TensorFlow 2 and PyTorch containers in order to address Anaconda commercial license requirements
- Updated Docker container to disable Ubuntu 18.04 support (which was available in Vitis AI but not officially supported). This was done to address CVE-2021-3493.
Model Zoo
- Add more classic models without modification such as YOLO series and 2D Unet
- Provide model info card for each model and Jupyter Notebook tutorials for new models
- New copyleft repo for GPL license models
ONNX CNN Quantizer
- Initial release
- This is a new quantizer that supports the direct PTQ quantization of ONNX models for DPU. It is a plugin built for the ONNXRuntime native quantizer.
- Supports power-of-two quantization with both QDQ and QOP format.
- Supports Non-overflow and Min-MSE quantization methods.
- Supports various quantization configurations in power-of-two quantization in both QDQ and QOP format.
- Supports signed and unsigned configurations.
- Supports symmetry and asymmetry configurations.
- Supports per-tensor and per-channel configurations.
- Supports ONNX models in excess of 2GB.
- Supports the use of the CUDAExecutionProvider for calibration in quantization.
PyTorch CNN Quantizer
- Pytorch 1.13 and 2.0 support
- Mixed precision quantization support, supporting float32/float16/bfloat16/intx mixed quantization
- Support of bit-wise accuracy cross check between quantizer and ONNX-runtime
- Split and chunk operators were automatically converted to slicing
- Dict input/output support for model forward function
- Keywords argument support for model forward function
- Matmul subroutine support
- Add support for BFP data type quantization
- QAT supports training on mutiple GPUs
- QAT supports operations with multiple inputs or outputs
TensorFlow 2 CNN Quantizer
- Updated to support Tensorflow 2.12 and Python 3.8.
- Adds support for quantizing subclass models.
- Adds support for mix precision, supports layer-wise data type configuration, supports float32, float16, bfloat16, and int quantization.
- Adds support for BFP datatypes, and add a new quantize strategy called 'bfp'.
- Adds support to quantize Keras nested models.
- Adds experimental support for quantizing the frozen pb format model in TensorFlow 2.x.
- Adds a new 'gpu' quantize strategy which uses float scale quantization and is used in GPU deployment scenarios.
- Adds support to exporting the quantized model to frozen pb format or onnx format.
- Adds support to exporting the quantized model with power-of-two scales to frozen pb format with "FixNeuron" inside, to be compatible with some compilers with pb format input.
- Adds support for splitting Avgpool and Maxpool with large kernel sizes into smaller kernel sizes.
Bug Fixes:
- Fixes a gradient bug in the 'pof2s_tqt' quantize strategy.
- Fixes a bug of quantization position change introduced by the fast fine-tuning process after the PTQ.
- Fixes a graph transformation bug when a TFOpLambda op has multiple inputs.
TensorFlow 1 CNN Quantizer
- Adds support for fast fine-tuning that improves PTQ accuracy.
- Adds support for folding Reshape and ResizeNearestNeighbor operators.
- Adds support for splitting Avgpool and Maxpool with large kernel sizes into smaller kernel sizes.
- Adds support for quantizing Sum, StridedSlice, and Maximum operators.
- Adds support for setting the input shape of the model, which is useful in the deployment of models with undefined input shapes.
- Adds support for setting the opset version in exporting onnx format.
Bug Fixes:
- Fixes a bug where the AddV2 operation is misunderstood as a BiasAdd.
Compiler
- New operators supported: Broadcast add/mul, Bilinear downsample, Trilinear downsample, Group conv2d, Strided-slice
- Performance improved on XV2DPU
- Error message improved
- Compilation time speed up
PyTorch Optimizer
- Removed requirement for license purchase
- Migrated to Github open-source
- Supports PyTorch 1.11, 1.12 and 1.13
- Supports pruning of grouped convolution
- Supports setting the number of channels to be a multiple of the specified number after pruning
TensorFlow 2 Optimizer
- Removed requirement for license purchase
- Migrated to Github open-source
- Supports TensorFlow 2.11 and 2.12
- Supports pruning of tf.keras.layers.SeparableConv2D
- Fixed tf.keras.layers.Conv2DTranspose pruning bug
- Supports setting the number of channels to be a multiple of the specified number after pruning
Runtime
- Supports Versal AI Edge VEK280 evalustion kit
- Buffer optimization for multi-batches to improve performance
- Add new tensor buffer interface to enhance zero copy
Vitis ONNX Runtime Execution Provider (VOE)
- Supports ONNX Opset version 18, ONNX Runtime 1.16.0 and ONNX version 1.13
- Supports both C++ and Python APIs(Python version 3)
- Supports VitisAI EP and other EPs to work together to deploy the model
- Provide Onnx examples based on C++ and Python APIs
- VitisAI EP is open source and upstreamed to ONNX public repo on Github
Library
- Added three new model libraries and support for five additional models
Model Inspector:
Support inspection for new DPU IPs
Profiler
- Added Profiler support for DPUCV2DX8G
DPU IP - Versal AIE-ML Targets DPUCV2DX8G (Versal AI Edge / Core)
- First general access release
- Configurable from C20B1 to C20B14
- Support most 2D operators required to deploy models found in the Model Zoo
- General support for the VE2802/VC2802 and V70
- Early access support for the VE2302 via this lounge
DPU IP - Zynq Ultrascale+ DPUCZDX8G
- IP has reached maturity
- No updates for this release
- No updated reference design (DPU TRD) will be published for minor (ie x.5) releases
- No updated pre-built board image will be published for minor (ie x.5) releases
DPU IP - Versal AIE Targets DPUCVDX8H
- IP has reached maturity
- No updates for this release
- No updated reference design (DPU TRD) will be published for minor (ie x.5) releases
- No updated pre-built board image will be published for minor (ie x.5) releases
DPU IP - CNN - Alveo Data Center DPUCVDX8G
- IP has reached maturity
- No updates for this release
- No updated reference design (DPU TRD) will be published for minor (ie x.5) releases
- No updated pre-built board image will be published for minor (ie x.5) releases
WeGO
- Enhanced WeGO to support V70 DPU GA release.
- Upgraded WeGO to provide support for PyTorch 1.13.1 and TensorFlow r2.12.
- Enhanced WeGO-Torch to support PyTorch 2.0 as a preview feature.
- Introduced new C++ API support for WeGO-Torch in addition to Python APIs.
- Implemented WeGO-TF1 and WeGO-TF2 as out-of-tree plugins.
Known Issues
- Engineering to add comments
AMD, the AMD Arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc.