Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CK Tile][Critical][Performance] Slow CK Tile GEMM compared to universal_gemm in the old CK #1727

Open
zjing14 opened this issue Dec 6, 2024 · 2 comments

Comments

@zjing14
Copy link
Contributor

zjing14 commented Dec 6, 2024

Tried CK Tile GEMM with V3 pipeline (https://github.com/ROCm/composable_kernel/blob/develop/example/ck_tile/03_gemm/universal_gemm.cpp) for compute bound cases (i.e., M = 4096, N = 4096 and K = 4096), but get much worse performance than (https://github.com/ROCm/composable_kernel/blob/develop/example/01_gemm/gemm_xdl_bf16_v3.cpp) with the same tile size 256x256x64.

CK Tile V3 (359.483 Tflops)

./bin/tile_example_universal_gemm -m=4096 -n=4096 -k=4096 -v=0
Launching kernel with args: grid: {16, 16, 1}, blocks: {256, 1, 1}
Run Gemm kernel with M =4096 N =4096 K =4096 StrideA =4096 StrideB =4096 StrideC =4096 : 0.382324 ms, 359.483 TFlops, 263.293 GB/s,

vs. Old CK GEMM V3 (615.46 TFlops)

./bin/example_gemm_xdl_bf16_v3 0 2 1 4096 4096 4096 4096 4096 4096 1
a_m_k: dim 2, lengths {4096, 4096}, strides {4096, 1}
b_k_n: dim 2, lengths {4096, 4096}, strides {1, 4096}
c_m_n: dim 2, lengths {4096, 4096}, strides {4096, 1}
RotatingMemWrapper: { size_a: 33554432, size_b: 33554432, rotating_count: 4}
Perf: 0.223311 ms, 615.46 TFlops, 450.776 GB/s, DeviceGemmXdlUniversal<Default, RCR> BlkSize: 256, BlkTile: 256x256x64, WaveTile: 32x32, WaveMap: 4x4, VmemReadVec: 8x8, BlkGemmPipelineScheduler: Intrawave, BlkGemmPipelineVersion: v3, BlkGemmPipelinePrefetchStages: 2
@zjing14
Copy link
Contributor Author

zjing14 commented Dec 6, 2024

@carlushuang @aosewski

@ppanchad-amd
Copy link

Hi @zjing14. Internal ticket has been created to investigate your issue. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants