Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use PrecompileTools to warmup CUDA.jl #2325

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from
Draft

Conversation

vchuravy
Copy link
Member

No description provided.

@vchuravy vchuravy requested a review from maleadt April 15, 2024 17:25
@maleadt
Copy link
Member

maleadt commented Apr 15, 2024

So IIUC it isn't worth using the actual PTX ISA or device capability here because the inference caches are shared between CUDA subtargets, and this will prime them.

I considered whether we need a mechanism to ensure this doesn't actively use the CUDA toolkit, which would prevent use on a system without a GPU, but I think CI should already cover that:

- group: ":eyes: Special"
depends_on: "cuda"
steps:
- label: "GPU-less environment"
plugins:
- JuliaCI/julia#v1:
version: "1.10"
- JuliaCI/julia-coverage#v1:
dirs:
- src
- lib
- examples
- JuliaCI/julia-test#v1:
run_tests: false
command: |
julia --project -e '
using CUDA
@assert !CUDA.functional()
@assert !isdefined(CUDA, :libcudart)
CUDA.set_runtime_version!(v"11.6")'
julia --project -e '
using CUDA
@assert !CUDA.functional()
@assert isdefined(CUDA, :libcudart)'
agents:
queue: "juliagpu"
intel: "*"
if: build.message !~ /\[skip tests\]/ && build.message !~ /\[skip special\]/ && !build.pull_request.draft
timeout_in_minutes: 5
. We should check if that actually works (e.g., by using a precompile workload that does initialize CUDA and ensure that job fails).

@maleadt maleadt marked this pull request as draft April 16, 2024 10:46
@maleadt maleadt added the enhancement New feature or request label Apr 16, 2024
@vchuravy
Copy link
Member Author

So IIUC it isn't worth using the actual PTX ISA or device capability here because the inference caches are shared between CUDA subtargets, and this will prime them.

Correct!

Using JuliaGPU/GPUCompiler.jl#557 (comment) this improved TTFK from 12s to 4s

@vchuravy vchuravy force-pushed the vc/precompile_tools branch from 80ec869 to c7f880c Compare April 19, 2024 14:49
@vchuravy vchuravy marked this pull request as ready for review April 19, 2024 14:49
@vchuravy vchuravy marked this pull request as draft April 19, 2024 14:50
Copy link

codecov bot commented Apr 19, 2024

Codecov Report

Attention: Patch coverage is 12.50000% with 7 lines in your changes missing coverage. Please review.

Project coverage is 59.96%. Comparing base (14de009) to head (c7f880c).

Current head c7f880c differs from pull request most recent head 03530f0

Please upload reports for the commit 03530f0 to get more accurate results.

Files Patch % Lines
src/precompile.jl 12.50% 7 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master    #2325       +/-   ##
===========================================
- Coverage   73.37%   59.96%   -13.42%     
===========================================
  Files         157      156        -1     
  Lines       15197    14989      -208     
===========================================
- Hits        11151     8988     -2163     
- Misses       4046     6001     +1955     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@vchuravy vchuravy marked this pull request as ready for review April 19, 2024 15:58
src/precompile.jl Outdated Show resolved Hide resolved
@vchuravy vchuravy force-pushed the vc/precompile_tools branch from 51520a1 to 03530f0 Compare June 24, 2024 13:56
@maleadt maleadt force-pushed the vc/precompile_tools branch from 03530f0 to bfe2eb9 Compare September 18, 2024 08:34
@maleadt
Copy link
Member

maleadt commented Sep 18, 2024

Fails on 1.11:

2024-09-18 10:44:13 CEST	ERROR: The following 1 direct dependency failed to precompile:
2024-09-18 10:44:13 CEST	
2024-09-18 10:44:13 CEST	CUDA --code-coverage=@/var/lib/buildkite-agent/builds/gpuci-7/julialang/cuda-dot-jl --color=yes --check-bounds=yes --warn-overwrite=yes --depwarn=yes --inline=yes --startup-file=no --track-allocation=none
2024-09-18 10:44:13 CEST	
2024-09-18 10:44:13 CEST	Failed to precompile CUDA [052768ef-5323-5732-b1bb-66c8b64840ba] to "/root/.cache/julia-buildkite-plugin/depots/3cc01fab-3357-4a7a-9294-cde2d3115a97/compiled/v1.11/CUDA/jl_aa67nH".
2024-09-18 10:44:13 CEST	LLVM ERROR: Cannot select: intrinsic %llvm.nvvm.membar.sys

@maleadt maleadt added the needs changes Changes are needed. label Sep 18, 2024
@maleadt maleadt marked this pull request as draft September 18, 2024 09:32
@maleadt maleadt force-pushed the master branch 8 times, most recently from 2274085 to 7ec9719 Compare December 19, 2024 17:51
@maleadt maleadt force-pushed the master branch 7 times, most recently from 5d585c4 to c850163 Compare December 20, 2024 08:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request needs changes Changes are needed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants