-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add disk cache infrastructure for Julia 1.11 #557
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #557 +/- ##
==========================================
- Coverage 74.75% 71.80% -2.95%
==========================================
Files 24 24
Lines 3414 3469 +55
==========================================
- Hits 2552 2491 -61
- Misses 862 978 +116 ☔ View full report in Codecov by Sentry. |
Okay the big caveat is that this will only work with code that has been precompiled. So we will need to make sure our inference caches are hot. |
Using this small code module GemmDenseCUDA
using PrecompileTools
import CUDA
import CUDA: i32
BLOCK_SIZE = 32
function gemm!(A, B, C)
row = (CUDA.blockIdx().x - Int32(1)) * CUDA.blockDim().x + CUDA.threadIdx().x
col = (CUDA.blockIdx().y - Int32(1)) * CUDA.blockDim().y + CUDA.threadIdx().y
sum = zero(eltype(C))
if row <= size(A, 1) && col <= size(B, 2)
for i in 1:size(A, 2)
@inbounds sum += A[row, i] * B[i, col]
end
@inbounds C[row, col] = sum
end
return
end
@setup_workload let
if CUDA.functional()
A16 = CUDA.CuArray{Float16,2}(undef, 0, 0)
A32 = CUDA.CuArray{Float32,2}(undef, 0, 0)
A64 = CUDA.CuArray{Float64,2}(undef, 0, 0)
@compile_workload begin
CUDA.@cuda launch=false gemm!(A16, A16, A16)
CUDA.@cuda launch=false gemm!(A32, A32, A32)
CUDA.@cuda launch=false gemm!(A64, A64, A32)
end
end
end
end #module Testing in REPLFirst compilation
Second compilation
Without disk-cache (just inference caching)First compilation
Second compilation
Would be interesting to see the impact of JuliaGPU/CUDA.jl#2325 With disk-cacheFirst compilation
Second Compilation
|
|
So precompiling with disk_cache = true, leads to this interesting hang.
|
sigh
|
The issue is that our
The issue is that during image creation we sometimes broaden our horizon and then try to compile based on methods |
WaterLily TGV example
ClimaOcean OMIPNote that I tried to make the model as "small" as possible,
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of minor naming nits. I don't have the time right now to test or review closely, but if it's working fine feel free to merge so that it gets some exposure on the master branch.
Yay! |
Ah we might still need to add an interface for backends to opt out. E.g. Enzyme |
Replaces #351
Using JuliaLang/julia#52233 for ensuring inference correctness.
Note the CodeInstances should be precompiled, which we don't yet have a nice-ish infrastructure for.
Might need JuliaLang/julia#53943 so that we can encode the dependencies of the CodeInstance,
instead of just
GPUCompiler
.TODO:
precompile
functionCompilerConfig
to be robust against compiler params changes