Skip to content
This repository has been archived by the owner on Dec 1, 2024. It is now read-only.

How do I match the results of profiling with the parameters of the cost model? #131

Open
xvanQ opened this issue Jan 31, 2024 · 1 comment

Comments

@xvanQ
Copy link

xvanQ commented Jan 31, 2024

The output of profile bandwidth is as follows:
size: 0.25 MB, gpu-to-cpu bandwidth: 5.505 GB/s
size: 32.00 MB, gpu-to-cpu bandwidth: 13.220 GB/s
size: 128.00 MB, gpu-to-cpu bandwidth: 13.324 GB/s

size: 0.25 MB, cpu-to-gpu bandwidth: 4.556 GB/s
size: 32.00 MB, cpu-to-gpu bandwidth: 12.285 GB/s
size: 128.00 MB, cpu-to-gpu bandwidth: 12.251 GB/s

Which is ctog_bdw, which is gtoc_bdw_cache, which is gtoc_bdw_hidden?

The output of profile matmul is as follows:
device: cuda, N: 1024, latency: 0.06 ms, TFLOPS: 68.186
device: cuda, N: 2048, latency: 0.20 ms, TFLOPS: 97.026

device: cpu, N: 1024, latency: 0.89 ms, TFLOPS: 3.488
device: cpu, N: 2048, latency: 8.44 ms, TFLOPS: 2.924

which is mm_flops_p, mm_flops_g, bmm_flops_p,bmm_flops_g and cpu_flops?
Thanks

@nustart0720
Copy link

Have you figured out this question, I have this question too

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants