Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ROCm][PerfOptimizataion] Enable layernorm using ck_tile APIs #1806

Draft
wants to merge 8 commits into
base: 2.5_perf_fix
Choose a base branch
from

Conversation

lcskrishna
Copy link

This PR does the following:

  • Cherry-pick the latest LN optimizations.
  • Add composable_kernels as a third-party submodule in PyTorch.
  • Integrate Layernorm using CK_tile APIs.
  • Update the existing flow to switch between CK-tile vs HIP code based on specific conditions.

mhalk and others added 7 commits December 11, 2024 13:53
… of 1.f/x (ROCm#1688)

Replace (more) exact calculation with hardware approximation.

Benefits:
Reduced code size.
Improved performance for certain scenarios.

Experiments show low reduction in precision.
Experiments show no significant performance regressions.
bfloat16 as well as float16 related calculations may benefit largely
from this change.

vectorized_layer_norm_kernel:
Gains performance esp. for the following tensor shapes.
Lower values for dim1 do not change performance significantly.
dim1 = 8k-65k may gain considerable performance, but decline gradually
with size.

```
dim0    dim1
----    ----
1024	8192
1024	16384
1024	32768
1024	65536
1024	131072
1024	262144
1024	524288
```

Co-authored-by: Hashem Hashemi <[email protected]>
@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Dec 23, 2024

Jenkins build for 1c8edefea13fda2915cd4bde42fac3d0fd961513 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Dec 23, 2024

Jenkins build for 4c2dcdc2fa414f2d223fcd7f27cc72fe5352a6e6 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants