Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(tensile_host): fix solutions for gfx103x not able to load #1455

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

wfjsw
Copy link

@wfjsw wfjsw commented Jul 23, 2024

This patch possibly will fix the problem where the added map broke gfx1031-gfx1035, causing any Tensile solutions for these archs unable to load, forcing them to drop to fallback.

Related log:

ProblemMap Searching for Contraction_l_Alik_Bljk_Cijk_Dijk found Problem library (1 rows)
Object key: 768, 77, 768
Key: 768, 77, 768
Starting point: 17179869184, 1, 2937652110784
Rightward search...
Leftward search...

129, 129, 65: 905234 < 1.79769e+308 <-- Best distance, but no matching solution
129, 129, 65: 905234 == 905234
129, 129, 65: 905234 == 905234
129, 129, 65: 905234 == 905234
129, 129, 65: 905234 == 905234
129, 129, 64: 906641 > 905234
129, 129, 64: 906641 > 905234

......

Considered 100% of entries.
Solution index selected: 69
Running kernel: Cijk_Alik_Bljk_HHS_BH_MT64x32x8_SN_AF0EM1_AMAS2_ASEM1_BL1_BS1_EPS0_FL0_GLVWA2_GLVWB1_GRVW2_GSU1_GSUASB_ISA000_IU1_K1_KLS_LPB0_LDL1_LRVW2_MMFSC_NLCA1_NLCB1_PGR0_PLR1_RK0_SIA1_SU32_SUM0_SUS256_SVW4_TT4_2_USFGROn1_VAW1_VSn1_VW2_VWB2_WS64_WG16_16_1_WGM8

Could you please backport this to HIP SDK 6.1.2 for Windows if possible?

@amcamd
Copy link
Contributor

amcamd commented Jul 25, 2024

@wfjsw My understanding is that your change will cause lazy loading for gfx1031, gfx1032, gfx1034, gfx1035 assembly kernels listed in .yaml files in the directory https://github.com/ROCm/rocBLAS/tree/develop/library/src/blas3/Tensile/Logic/asm_full . If you search for the strings gfx1031, gfx1032, gfx1034, gfx1035 in this directory you will not find any matches, so these strings are not in getLazyLoadingArch. When assembly kernels are added for an architecture, the architecture is added to getLazyLoadingArch.

Can you let us know the intention of your PR:

  1. Are you trying to lazy load assembly kernels for gfx1031, gfx1032, gfx1034, gfx1035?
  2. Are you trying to build rocBLAS for gfx1031, gfx1032, gfx1034, gfx1035?

@wfjsw
Copy link
Author

wfjsw commented Jul 26, 2024

I currently have assembly kernels for these cards, but the stock rocblas.dll refuses to load them when they are placed in search path as it was in 5.7.1, due to this list being added since 6.0.

Also this does seem to affect non-lazyloading as well. Testing appears the non-lazy libraries are also not applied.

@lamikr
Copy link

lamikr commented Dec 18, 2024

rocm sdk builder bug 180 seems to be not related to this bug.

rocBLAS 6.1.2 libraries build by ROCM sdk builder, did not work for gfx906 target on all apps while they worked just fine for rdna1/2/3 gpus. Problem is somehow related to code object version V5 with gfx906 and may be somekind of miss handling of xnack feature on those cards. Some apps were failing and complained about missing kernel symbols while they could be grepped from the rocBLAS co-files. When I use DTensile_CODE_OBJECT_VERSION=default instead of DTensile_CODE_OBJECT_VERSION=V5,then the problem goes away.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants