Releases: ROCm/rocWMMA
Releases · ROCm/rocWMMA
rocWMMA 0.9 for ROCm 5.4.1
rocWMMA code for ROCm 5.4.1 did not change. The library was rebuilt for the updated ROCm 5.4.1 stack.
rocWMMA 0.9 for ROCm 5.4.0
Added
- Added gemm driver APIs for flow control builtins
- Added benchmark logging systems
- Restructured tests to follow naming convention. Added macros for test generation
Changed
- Changed CMake to accomodate the modified test infrastructure
- Fine tuned the multi-block kernels with and without lds
- Adjusted Maximum Vector Width to dWordx4 Width
- Updated Efficiencies to display as whole number percentages
- Updated throughput from GFlops/s to TFlops/s
- Reset the ad-hoc tests to use smaller sizes
- Modified the output validation to use CPU-based implementation against rocWMMA
- Modified the extended vector test to return error codes for memory allocation failures
rocWMMA 0.8 for ROCm 5.3.3
rocWMMA code for ROCm 5.3.3 did not change. The library was rebuilt for the updated ROCm 5.3.3 stack.
rocWMMA 0.8 for ROCm 5.3.2
rocWMMA code for ROCm 5.3.2 did not change. The library was rebuilt for the updated ROCm 5.3.2 stack.
rocWMMA 0.8 for ROCm 5.3.1
rocWMMA code for ROCm 5.3.1 did not change. The library was rebuilt for the updated ROCm 5.3.1 stack.
rocWMMA 0.8 for ROCm 5.3.0
Added
- Added runtime checks to disable tests on non-target GPUS
- Added workgroup aware gemm kernels
- Added workgroup aware validation and benchmark test suite
- Added warmup run to existing tests
Changed
- Refactored lds_mapping_util into gemm global, local mapping, gemm driver, gemm config and scheduling classes
- Modified resource allocation and tracking of gemm and dlrm buffers
- Improved low-level data loading patterns
- Reduced branching on cooperative load and store
- Updated gemv sample
- Updated gemm sample
rocWMMA 0.7 for ROCm 5.2.3
rocWMMA code for ROCm 5.2.3 did not change. The library was rebuilt for the updated ROCm 5.2.3 stack.
rocWMMA 0.7 for ROCm 5.2.1
rocWMMA code for ROCm 5.2.1 did not change. The library was rebuilt for the updated ROCm 5.2.1 stack.
rocWMMA 0.7 for ROCm 5.2.0
Added
- Added unit tests for DLRM kernels
- Added GEMM sample
- Added DLRM sample
- Added SGEMV sample
- Added unit tests for cooperative wmma load and stores
- Added unit tests for IOBarrier.h
- Added wmma load/ store tests for different matrix types (A, B and Accumulator)
- Added more block sizes 1, 2, 4, 8 to test MmaSyncMultiTest
- Added block sizes 4, 8 to test MmaSynMultiLdsTest
- Added support for wmma load / store layouts with block dimension greater than 64
- Added IOShape structure to define the attributes of mapping and layouts for all wmma matrix types
- Added CI testing for rocWMMA
Changed
- Renamed wmma to rocwmma in cmake, header files and documentation
- Renamed library files
- Modified Layout.h to use different matrix offset calculations (base offset, incremental offset and cumulative offset)
- Opaque load/store continue to use incrementatl offsets as they fill the entire block
- Cooperative load/store use cumulative offsets as they fill only small portions for the entire block
- Increased Max split counts to 64 for cooperative load/store
- Moved all the wmma definitions, API headers to rocwmma namespace
- Modified wmma fill unit tests to validate all matrix types (A, B, Accumulator)