-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add function for recursively printing parameter memory #2560
base: master
Are you sure you want to change the base?
Conversation
I'm open to changing the format somehow, but I do think that this format is somewhat simple and explicit. |
This feels like a very niche feature that I'm not sure is worth putting in CUDA.jl. We already report on the size of each argument; can't you keep the specific functionality to analyze within individual arguments in your package, or even a dedicated package for this purpose? I'd rather expose some way for you to perform that analysis on the actual kernel arguments (e.g., by saving them in the error that's thrown by |
Whatever feature is implemented to help solve parameter space problems, a key criterion should be that it does not require launching a kernel to do the debugging. It's much more efficient to design kernel arguments by direct inspection, rather than by trial-and-error kernel launching + digging through stacktraces which is the main issue with the current workflow. |
I fail to see what's more convenient about doing Can you elaborate on the workflow you want? I'm proposing here that you would be able to |
The workflow is:
Executing on 2a doesn't really require any new features; we can simply do the arithmetic to figure out whether the kernels will succeed. For 2b, we may have to change Here's an example conversation where we are trying to deduce how to solve a parameter space problem: |
I see, so this is strictly a development utility that doesn't actually need any support in CUDA.jl? FWIW, you should be able to make this all type-based, by just inspecting the device-side types that CUDA.jl already reports:
With a recursive size printer that operates on |
Correct, I think the only point of putting in CUDA is to make it more visible / keep it up to date with CUDA development. It could easily find a home elsewhere. Also it doesn't even really need to be packaged since I don't really see much scope for further development --- offering it in a package is merely trying to be friendly to other developers I guess.
I'm not sure I understand though @charleskawczynski might... the point of |
Yes, but in the error message that's reported by CUDA.jl you already get to see the types of the converted arguments: julia> @cuda Returns(nothing)((ntuple(_->UInt64(1), 2^13), CUDA.rand(1)))
ERROR: Kernel invocation uses too much parameter memory.
64.047 KiB exceeds the 31.996 KiB limit imposed by sm_89 / PTX v8.5.
Relevant parameters:
[1] args::Tuple{NTuple{8192, UInt64}, CuDeviceVector{Float32, 1}} uses 64.031 KiB So you helper could ingest the |
5d585c4
to
c850163
Compare
As Clima has developed increasingly complex models and fused increasingly complex broadcast expressions, we've been running into parameter memory issues more frequently.
One issue I have with the existing printed message is that it does not provide granularity for large objects.
This PR implements a recursive print function/macro
@rprint_parameter_memory(some_object)
that users can use (and build options around) to print parameter memory usage with high granularity. For example (which I've tentatively implemented in MultiBroadcastFusion):I'm cc-ing some people who may also be interested in this: @glwagner @simonbyrne @simone-silvestri