Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profile inside CUDA kernel #38

Open
bernhardmgruber opened this issue Jul 1, 2022 · 3 comments
Open

Profile inside CUDA kernel #38

bernhardmgruber opened this issue Jul 1, 2022 · 3 comments

Comments

@bernhardmgruber
Copy link
Member

I recently came across the need to profile sections inside a single CUDA kernel. We wanted to figure out which subpart of the kernel consumes how much time. In Bactria terms, that would mean that we could enter and leave phases and sectors inside CUDA device code.

Is such a feature planned or in scope of bactria?

@j-stephan
Copy link
Member

I am not aware of any profiling / tracing API for in-kernel usage. Even the NVIDIA tools (which are built upon the cupti API) will only give you an analysis of the entire kernel AFAIK. So while this fits bactria's scope in theory I don't see a way to achieve this in practice.

@bernhardmgruber
Copy link
Member Author

Here is a publication from some of our friends, who implemented this: https://ieeexplore.ieee.org/abstract/document/6337509. I think this could be very useful.

@j-stephan
Copy link
Member

j-stephan commented Jul 6, 2022

Interesting read, I wasn't aware of this paper. The ideas there match my own showerthoughts (which are admittedly not as detailed). This also holds true for the problems: In-kernel profiling / tracing is only available by modifying the kernel signature. This is very intrusive and something I tried to avoid with bactria. Maybe there are some new CUDA features that have been added since 2012 which would make this easier, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants