Releases · ROCm/rocBLAS

15 Dec 18:30

rocm-ci

rocm-6.0.0

88df972

rocBLAS 4.0.0 for ROCm 6.0.0

Added

Addition of beta API rocblas_gemm_batched_ex3 and rocblas_gemm_strided_batched_ex3
Added input/output type f16_r/bf16_r and execution type f32_r support for Level 2 gemv_batched and gemv_strided_batched
Added rocblas_status_excluded_from_build to be used when calling functions which require Tensile when using rocBLAS built without Tensile
Added system for async kernel launches setting a failure rocblas_status based on hipPeekAtLastError discrepancy

Optimized

Trsm performance for small sizes m < 32 && n < 32

Deprecated

In a future release atomic operations will be disabled by default so results will be repeatable. Atomic operations can always be enabled or disabled using the function rocblas_set_atomics_mode. Enabling atomic operations can improve performance.

Removed

rocblas_gemm_ext2 API function is removed
in-place trmm API from Legacy BLAS is removed. It is replaced by an API that supports both in-place and out-of-place trmm
int8x4 support is removed. int8 support is unchanged
The #define STDC_WANT_IEC_60559_TYPES_EXT has been removed from rocblas-types.h. Users who want ISO/IEC TS 18661-3:2015 functionality must define STDC_WANT_IEC_60559_TYPES_EXT before including float.h, math.h, and rocblas.h
The default build removes device code for gfx803 architecture from the fat binary

Fixed

Make offset calculations for rocBLAS functions 64 bit safe. Fixes for very large leading dimension or increment potentially causing overflow:
- Level2: gbmv, gemv, hbmv, sbmv, spmv, tbmv, tpmv, tbsv, tpsv
Lazy loading to support heterogeneous architecture setup and load appropriate tensile library files based on the device's architecture
Guard against no-op kernel launches resulting in potential hipGetLastError

Changed

Default verbosity of rocblas-test reduced. To see all tests set environment variable GTEST_LISTENER=PASS_LINE_IN_LOG

Assets 2

13 Oct 18:57

rocm-ci

rocm-5.7.1

b80e422

rocBLAS 3.1.0 for ROCm 5.7.1

rocBLAS code for ROCm 5.7.1 did not change. The library was rebuilt for the updated ROCm 5.7.1 stack.

Assets 2

15 Sep 17:29

rocm-ci

rocm-5.7.0

b80e422

rocBLAS 3.1.0 for ROCm 5.7.0

Added

yaml lock step argument scanning for rocblas-bench and rocblas-test clients. See Programmers Guide for details.
rocblas-gemm-tune is used to find the best performing GEMM kernel for each of a given set of GEMM problems.

Fixed

make offset calculations for rocBLAS functions 64 bit safe. Fixes for very large leading dimensions or increments potentially causing overflow:
- Level 1: axpy, copy, rot, rotm, scal, swap, asum, dot, iamax, iamin, nrm2
- Level 2: gemv, symv, hemv, trmv, ger, syr, her, syr2, her2, trsv
- Level 3: gemm, symm, hemm, trmm, syrk, herk, syr2k, her2k, syrkx, herkx, trsm, trtri, dgmm, geam
- General: set_vector, get_vector, set_matrix, get_matrix
- Related fixes: internal scalar loads with > 32bit offsets
- fix in-place functionality for all trtri sizes

Changed

dot when using rocblas_pointer_mode_host is now synchronous to match legacy BLAS as it stores results in host memory
enhanced reporting of installation issues caused by runtime libraries (Tensile)
standardized internal rocblas C++ interface across most functions

Deprecated

Removal of STDC_WANT_IEC_60559_TYPES_EXT define in future release

Dependencies

optional use of AOCL BLIS 4.0 on Linux for clients
optional build tool only dependency on python psutil

Assets 2

29 Aug 20:12

rocm-ci

rocm-5.6.1

4b0751e

rocBLAS 3.0.0 for ROCm 5.6.1

rocBLAS code for ROCm 5.6.1 did not change. The library was rebuilt for the updated ROCm 5.6.1 stack.

Assets 2

28 Jun 23:17

rocm-ci

rocm-5.6.0

4b0751e

rocBLAS 3.0.0 for ROCm 5.6.0

Optimizations

Improved performance of Level 2 rocBLAS GEMV on gfx90a GPU for non-transposed problems having small matrices and larger batch counts. Performance enhanced for problem sizes when m and n <= 32 and batch_count >= 256.
Improved performance of rocBLAS syr2k for single, double, and double-complex precision, and her2k for double-complex precision. Slightly improved performance for general sizes on gfx90a.

Added

Added bf16 inputs and f32 compute support to Level 1 rocBLAS Extension functions axpy_ex, scal_ex and nrm2_ex.

Deprecated

trmm inplace is deprecated. It will be replaced by trmm that has both inplace and out-of-place functionality
rocblas_query_int8_layout_flag() is deprecated and will be removed in a future release
rocblas_gemm_flags_pack_int8x4 enum is deprecated and will be removed in a future release
rocblas_set_device_memory_size() is deprecated and will be replaced by a future function rocblas_increase_device_memory_size()
rocblas_is_user_managing_device_memory() is deprecated and will be removed in a future release

Removed

is_complex helper was deprecated and now removed. Use rocblas_is_complex instead.
The enum truncate_t and the value truncate was deprecated and now removed from. It was replaced by rocblas_truncate_t and rocblas_truncate, respectively.
rocblas_set_int8_type_for_hipblas was deprecated and is now removed.
rocblas_get_int8_type_for_hipblas was deprecated and is now removed.

Dependencies

build only dependency on python joblib added as used by Tensile build
fix for cmake install on some OS when performed by install.sh -d --cmake_install

Fixed

make trsm offset calculations 64 bit safe

Changed

refactor rotg test code

Assets 2

24 May 19:06

rocm-ci

rocm-5.5.1

cdd561f

rocBLAS 2.47.0 for ROCm 5.5.1

rocBLAS code for ROCm 5.5.1 did not change. The library was rebuilt for the updated ROCm 5.5.1 stack.

Assets 2

01 May 21:04

rocm-ci

rocm-5.5.0

cdd561f

rocBLAS 2.47.0 for ROCm 5.5.0

Added

added functionality rocblas_geam_ex for matrix-matrix minimum operations
added HIP Graph support as beta feature for rocBLAS Level 1, Level 2, and Level 3(pointer mode host) functions
added beta features API. Exposed using compiler define ROCBLAS_BETA_FEATURES_API
added support for vector initialization in the rocBLAS test framework with negative increments
added windows build documentation for forthcoming support using ROCm HIP SDK
added scripts to plot performance for multiple functions

Optimizations

improved performance of Level 2 rocBLAS GEMV for float and double precision. Performance enhanced by 150-200% for certain problem sizes when (m==n) measured on a gfx90a GPU.
improved performance of Level 2 rocBLAS GER for float, double and complex float precisions. Performance enhanced by 5-7% for certain problem sizes measured on a gfx90a GPU.
improved performance of Level 2 rocBLAS SYMV for float and double precisions. Performance enhanced by 120-150% for certain problem sizes measured on both gfx908 and gfx90a GPUs.

Fixed

fixed setting of executable mode on client script rocblas_gentest.py to avoid potential permission errors with clients rocblas-test and rocblas-bench
fixed deprecated API compatibility with Visual Studio compiler
fixed test framework memory exception handling for Level 2 functions when the host memory allocation exceeds the available memory

Changed

install.sh internally runs rmake.py (also used on windows) and rmake.py may be used directly by developers on linux (use --help)
rocblas client executables all now begin with rocblas- prefix

Removed

install.sh removed options -o --cov as now Tensile will use the default COV format, set by cmake define Tensile_CODE_OBJECT_VERSION=default

Assets 2

22 Mar 20:47

rocm-ci

rocm-5.4.4

24f3891

rocBLAS 2.46.0 for ROCm 5.4.4

rocBLAS code for ROCm 5.4.4 did not change. The library was rebuilt for the updated ROCm 5.4.4 stack.

Assets 2

07 Feb 17:39

rocm-ci

rocm-5.4.3

24f3891

rocBLAS 2.46.0 for ROCm 5.4.3

rocBLAS code for ROCm 5.4.3 did not change. The library was rebuilt for the updated ROCm 5.4.3 stack.

Assets 2

13 Jan 16:42

rocm-ci

rocm-5.4.2

ef7a9bb

rocBLAS 2.46.0 for ROCm 5.4.2

rocBLAS code for ROCm 5.4.2 did not change. The library was rebuilt for the updated ROCm 5.4.2 stack.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added

Optimized

Deprecated

Removed

Fixed

Changed

Added

Fixed

Changed

Deprecated

Dependencies

Optimizations

Added

Deprecated

Removed

Dependencies

Fixed

Changed

Added

Optimizations

Fixed

Changed

Removed

Releases: ROCm/rocBLAS

rocBLAS 4.0.0 for ROCm 6.0.0

Added

Optimized

Deprecated

Removed

Fixed

Changed

rocBLAS 3.1.0 for ROCm 5.7.1

rocBLAS 3.1.0 for ROCm 5.7.0

Added

Fixed

Changed

Deprecated

Dependencies

rocBLAS 3.0.0 for ROCm 5.6.1

rocBLAS 3.0.0 for ROCm 5.6.0

Optimizations

Added

Deprecated

Removed

Dependencies

Fixed

Changed

rocBLAS 2.47.0 for ROCm 5.5.1

rocBLAS 2.47.0 for ROCm 5.5.0

Added

Optimizations

Fixed

Changed

Removed

rocBLAS 2.46.0 for ROCm 5.4.4

rocBLAS 2.46.0 for ROCm 5.4.3

rocBLAS 2.46.0 for ROCm 5.4.2