Use KernelAbstractions.jl #109

lukem12345 · 2024-10-04T02:33:20Z

Demonstrate the use of KernelAbstractions.jl to reduce code surface while supporting more architectures

lukem12345 · 2024-10-08T18:16:19Z

The overhead from KernelAbstractions.jl for the CPU backend seems minimal when run with @btime at the command line. Differing results from the “benchmark group” interface do not appear when run in this fashion. I’m going to look further into this discrepancy has any effect on TTFP, but it is likely some artifact of synchronization on benchmarking.

I’m going to look into whether the kernel ought to be created at the dec_wedge_product level and passed to dec_c_wedge_product and children explicitly, although initial spot-checking seems to indicate that the current organization is fine (i.e. the Julia compiler + KA.jl are sufficiently advanced to not re-compile the kernel each time).

I’m going to add support for Metal.jl as an explicit extension. Further, if UF RC has AMD GPUs available I’ll explicitly add those bindings as well. This consists of simply calling the appropriate array constructor on the terms to be passed to the kernel a la:

(CuArray.(wedge_cache[1:end-1])..., wedge_cache[end])

Of course, explicitly supporting any extensions is not strictly necessary anymore, as these functions can simply be passed (the equivalent of) CuArray and CuSparseMatrixCSC. This re-factor can hold for the time being, since eliminating the backend extensions would be better managed in the scope of a follow-up PR. This follow-up would truly make this library not only multi-backend, but also backend-agnostic. Supporting explicit extensions may still be a good-to-have for purposes of e.g. snoop-precompiling.

This current PR has not looked at writing more-performant kernels themselves. The nested-indexing pattern used in the current kernels (e.g. f[p[Int32(1), i]]) seems like low-hanging fruit to eliminate, but the particular implementation details of any particular kernel is of course abstracted away by the KernelAbstractions interface, with which this current PR is concerned. A follow-up PR should optimize these.

lukem12345 · 2024-10-08T23:03:11Z

Commit 453f5d5, which explicitly added support for the Apple Metal backend, works locally. But it looks like adding Metal.jl as a weak dep errors Windows and Linux at package precompile time, which should work. (And Metal.functional needs to get swapped out with a different check.)

jpfairbanks · 2024-10-09T16:25:52Z

I think that Decapodes will want to have a high level functions for each backend that is like "convert all my initial conditions to CuArray, call simulate" and "convert the results back to CPU Arrays for analysis" so keeping the CUDA/Metal/AMD support as extensions on that package would be reasonable, even if we can remove it from CombinatorialSpaces.

lukem12345 · 2024-10-10T01:49:50Z

To circumvent issues relating to the Metal.jl extension throwing errors at precompile time, I’ll just refactor the operators such that they take can take an array constructor (e.g. MtlArray) mentioned in a previous comment. Instead of dispatching off of Val{:Metal}, they can take this array constructor as an argument directly, essentially keeping similar “dispatch” semantics. Apple Metal does not support sparse matrices directly. (Apple Accelerate does, but AppleAccelerate.jl does not support them yet.) For the time being, I’ll test the remaining operators with dense AplMatrix matrices.

Use kernel abstraction for 0-2 wedge product

0d69282

lukem12345 added the enhancement New feature or request label Oct 4, 2024

Use kernel abstraction over 1-1 pp wedge

b4c252a

lukem12345 self-assigned this Oct 5, 2024

lukem12345 added 3 commits October 5, 2024 20:43

Clean Operators and use kernels for all primal wedge products

c8705d8

Use kernel abstractions for all pp wedge products

b74abf6

Dispatch hodge stars

c7bbdfc

lukem12345 marked this pull request as ready for review October 7, 2024 03:59

Specify default workgroup/ block sizes

4e7018e

lukem12345 added 2 commits October 8, 2024 15:42

Treat literal factor in 0-1 wedge as uniform

c8d7ad5

Explicitly support Metal backend for wedge product

453f5d5

Use MSE in backend tests

8a2614b

lukem12345 and others added 4 commits October 10, 2024 00:15

Dispatch backend based on constructor

f053e0f

Update path to backends tests

30438a2

Pass arbitrary val-types through freely

b4048ee

Check Apple silicon before executing Metal tests

90bc20d

lukem12345 requested a review from GeorgeR227 October 11, 2024 20:42

Use KA.zeros instead of casting manually

54e4640

lukem12345 merged commit 6979680 into main Oct 12, 2024
9 checks passed

lukem12345 mentioned this pull request Oct 12, 2024

Continue kernel-izing operators #113

Open

GeorgeR227 mentioned this pull request Nov 18, 2024

Multigrid inverse Laplacian (includes CombinatorialSpaces.jl v0.6.8 support) AlgebraicJulia/Decapodes.jl#276

Merged

GeorgeR227 mentioned this pull request Dec 3, 2024

Revert breaking changes in 0.6.8 #129

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use KernelAbstractions.jl #109

Use KernelAbstractions.jl #109

lukem12345 commented Oct 4, 2024

lukem12345 commented Oct 8, 2024

lukem12345 commented Oct 8, 2024

jpfairbanks commented Oct 9, 2024

lukem12345 commented Oct 10, 2024

Use KernelAbstractions.jl #109

Use KernelAbstractions.jl #109

Conversation

lukem12345 commented Oct 4, 2024

lukem12345 commented Oct 8, 2024

lukem12345 commented Oct 8, 2024

jpfairbanks commented Oct 9, 2024

lukem12345 commented Oct 10, 2024