Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support gfx950 layouts #692

Draft
wants to merge 10 commits into
base: main_perf
Choose a base branch
from
Draft

Support gfx950 layouts #692

wants to merge 10 commits into from

Conversation

zhanglx13
Copy link

No description provided.

API change:
- For blocked layout, use -tensorShape, which only takes two dims as dim0,dim1
- For dot layout, use -dotShape, which takes three dims as M,N,K
Separate each layout's code into their own files
- When kWidth is large, use a smaller elemSize honrizontally to save
space
- Improve the labels, such as
  - change vec to kWidth for operands
  - change opA/opB to inA/inB and include operand dims
  - remove group dims in the operands so that they don't overlap with
  operand block dims
- Better alignment: dot op and mfma zoomed-in pics are bottom aligned
kGroup is defined as total elements per thread / kWidth for one mfma
instruction.
We need kGroup = 2 only for the newly added mfma_f32_16x16x128_f8f6f4
and mfma_f32_32x32x64_f8f6f4 with f8 input type on MI350.
And print mfma instruction name accordingly.
For now, mixed precision mfma between 8-bit and 4- or 6-bit is not
supported yet.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant