Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA Graph Compute Function Refactor (precursor for performance improvements) #11042

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

aendk
Copy link
Contributor

@aendk aendk commented Jan 2, 2025

Hi All,
I am working on improving llama.cpp's CUDA graph performance on behalf of NVIDIA.
In preliminary testing, we are seeing up to 3% of performance gain by overlapping CPU and GPU work, and by improving CPU -> GPU copy scheduling on a high end system. The changes are likely to be even more impactful on less capable hardware.

To pave the way for these changes (and to provide readable diffs), I first isolated the cosmetic changes in this PR.
This PR does not contain any changes in the logic. It merely slims down the ggml_backend_cuda_graph_compute() by moving certain loops and other subtasks of the original function into 5 new functions.

These changes considerably improve the readability and future maintainability of this part of the CUDA backend.

Should I add prefixes to the new function names, and if so, what do you suggest?

@agray3 @mtavenrath

@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Jan 2, 2025
@aendk aendk marked this pull request as draft January 2, 2025 15:47
@aendk
Copy link
Contributor Author

aendk commented Jan 2, 2025

FYI: setting Status to Draft whilst I investigate the failed tests.

@ggerganov
Copy link
Owner

FYI: setting Status to Draft whilst I investigate the failed tests.

I think you just need to make the functions static.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants