Skip to content

Releases: ROCm/rocWMMA

rocWMMA 0.9 for ROCm 5.4.1

15 Dec 18:41
849a36c
Compare
Choose a tag to compare

rocWMMA code for ROCm 5.4.1 did not change. The library was rebuilt for the updated ROCm 5.4.1 stack.

rocWMMA 0.9 for ROCm 5.4.0

30 Nov 17:41
849a36c
Compare
Choose a tag to compare

Added

  • Added gemm driver APIs for flow control builtins
  • Added benchmark logging systems
  • Restructured tests to follow naming convention. Added macros for test generation

Changed

  • Changed CMake to accomodate the modified test infrastructure
  • Fine tuned the multi-block kernels with and without lds
  • Adjusted Maximum Vector Width to dWordx4 Width
  • Updated Efficiencies to display as whole number percentages
  • Updated throughput from GFlops/s to TFlops/s
  • Reset the ad-hoc tests to use smaller sizes
  • Modified the output validation to use CPU-based implementation against rocWMMA
  • Modified the extended vector test to return error codes for memory allocation failures

rocWMMA 0.8 for ROCm 5.3.3

17 Nov 19:21
Compare
Choose a tag to compare

rocWMMA code for ROCm 5.3.3 did not change. The library was rebuilt for the updated ROCm 5.3.3 stack.

rocWMMA 0.8 for ROCm 5.3.2

10 Nov 01:09
Compare
Choose a tag to compare

rocWMMA code for ROCm 5.3.2 did not change. The library was rebuilt for the updated ROCm 5.3.2 stack.

rocWMMA 0.8 for ROCm 5.3.1

28 Oct 17:02
Compare
Choose a tag to compare

rocWMMA code for ROCm 5.3.1 did not change. The library was rebuilt for the updated ROCm 5.3.1 stack.

rocWMMA 0.8 for ROCm 5.3.0

30 Sep 19:27
Compare
Choose a tag to compare

Added

  • Added runtime checks to disable tests on non-target GPUS
  • Added workgroup aware gemm kernels
  • Added workgroup aware validation and benchmark test suite
  • Added warmup run to existing tests

Changed

  • Refactored lds_mapping_util into gemm global, local mapping, gemm driver, gemm config and scheduling classes
  • Modified resource allocation and tracking of gemm and dlrm buffers
  • Improved low-level data loading patterns
  • Reduced branching on cooperative load and store
  • Updated gemv sample
  • Updated gemm sample

rocWMMA 0.7 for ROCm 5.2.3

18 Aug 16:59
1c4614a
Compare
Choose a tag to compare

rocWMMA code for ROCm 5.2.3 did not change. The library was rebuilt for the updated ROCm 5.2.3 stack.

rocWMMA 0.7 for ROCm 5.2.1

21 Jul 20:25
1c4614a
Compare
Choose a tag to compare

rocWMMA code for ROCm 5.2.1 did not change. The library was rebuilt for the updated ROCm 5.2.1 stack.

rocWMMA 0.7 for ROCm 5.2.0

28 Jun 18:46
1c4614a
Compare
Choose a tag to compare

Added

  • Added unit tests for DLRM kernels
  • Added GEMM sample
  • Added DLRM sample
  • Added SGEMV sample
  • Added unit tests for cooperative wmma load and stores
  • Added unit tests for IOBarrier.h
  • Added wmma load/ store tests for different matrix types (A, B and Accumulator)
  • Added more block sizes 1, 2, 4, 8 to test MmaSyncMultiTest
  • Added block sizes 4, 8 to test MmaSynMultiLdsTest
  • Added support for wmma load / store layouts with block dimension greater than 64
  • Added IOShape structure to define the attributes of mapping and layouts for all wmma matrix types
  • Added CI testing for rocWMMA

Changed

  • Renamed wmma to rocwmma in cmake, header files and documentation
  • Renamed library files
  • Modified Layout.h to use different matrix offset calculations (base offset, incremental offset and cumulative offset)
  • Opaque load/store continue to use incrementatl offsets as they fill the entire block
  • Cooperative load/store use cumulative offsets as they fill only small portions for the entire block
  • Increased Max split counts to 64 for cooperative load/store
  • Moved all the wmma definitions, API headers to rocwmma namespace
  • Modified wmma fill unit tests to validate all matrix types (A, B, Accumulator)