You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I recently came across the need to profile sections inside a single CUDA kernel. We wanted to figure out which subpart of the kernel consumes how much time. In Bactria terms, that would mean that we could enter and leave phases and sectors inside CUDA device code.
Is such a feature planned or in scope of bactria?
The text was updated successfully, but these errors were encountered:
I am not aware of any profiling / tracing API for in-kernel usage. Even the NVIDIA tools (which are built upon the cupti API) will only give you an analysis of the entire kernel AFAIK. So while this fits bactria's scope in theory I don't see a way to achieve this in practice.
Interesting read, I wasn't aware of this paper. The ideas there match my own showerthoughts (which are admittedly not as detailed). This also holds true for the problems: In-kernel profiling / tracing is only available by modifying the kernel signature. This is very intrusive and something I tried to avoid with bactria. Maybe there are some new CUDA features that have been added since 2012 which would make this easier, though.
I recently came across the need to profile sections inside a single CUDA kernel. We wanted to figure out which subpart of the kernel consumes how much time. In Bactria terms, that would mean that we could enter and leave phases and sectors inside CUDA device code.
Is such a feature planned or in scope of bactria?
The text was updated successfully, but these errors were encountered: