-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use KernelAbstractions.jl #109
Conversation
The overhead from KernelAbstractions.jl for the CPU backend seems minimal when run with I’m going to look into whether the I’m going to add support for Metal.jl as an explicit extension. Further, if UF RC has AMD GPUs available I’ll explicitly add those bindings as well. This consists of simply calling the appropriate array constructor on the terms to be passed to the kernel a la: (CuArray.(wedge_cache[1:end-1])..., wedge_cache[end]) Of course, explicitly supporting any extensions is not strictly necessary anymore, as these functions can simply be passed (the equivalent of) This current PR has not looked at writing more-performant kernels themselves. The nested-indexing pattern used in the current kernels (e.g. |
Commit 453f5d5, which explicitly added support for the Apple Metal backend, works locally. But it looks like adding Metal.jl as a weak dep errors Windows and Linux at package precompile time, which should work. (And |
I think that Decapodes will want to have a high level functions for each backend that is like "convert all my initial conditions to CuArray, call simulate" and "convert the results back to CPU Arrays for analysis" so keeping the CUDA/Metal/AMD support as extensions on that package would be reasonable, even if we can remove it from CombinatorialSpaces. |
To circumvent issues relating to the Metal.jl extension throwing errors at precompile time, I’ll just refactor the operators such that they take can take an array constructor (e.g. |
Demonstrate the use of KernelAbstractions.jl to reduce code surface while supporting more architectures