Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GGML][RPC] Support for models with non-512-aligned tensors over RPC. #11047

Merged
merged 7 commits into from
Jan 4, 2025

Conversation

matt23654
Copy link
Contributor

This PR adds support for quantized tensors with sizes not divisible by 512, such as those found in quantized versions of the Qwen2.5 72B model. Please also see discussion in #10943.

Changes

  • Implemented forwarding to CUDA backend on server for init_tensor and get_alloc_size calls
  • Forwarding all calls would result in unacceptable latency (tested and tok/s drops to 0.02). At the moment only calls for misaligned tensors are forwarded. There may be a better way of handling this in the future.

Performance Impact

  • Qwen2.5 72B Q4_K_M: Coherent at ~4 tokens/s over GbE with mix of GPU/RPC/CPU. (previously outputted garbage)
  • Existing models (e.g. LLaMA 3.3 70B Q4_K_M): Unaffected (~7 tokens/s over GbE with GPU/RPC)

Testing

  • Perplexity validation with Tulu-3 8B:
    • Without RPC: 6.9362 ± 0.04526
    • With RPC: 6.9362 ± 0.04526 (identical results)
  • test-backend-ops: Passed for supported backends
  • Note: Unable to run full CI due to bandwidth limitations with model downloads

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Jan 2, 2025
Copy link
Collaborator

@rgerganov rgerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks fine to me, some minor comments inline

ggml/src/ggml-rpc/ggml-rpc.cpp Outdated Show resolved Hide resolved
ggml/src/ggml-rpc/ggml-rpc.cpp Outdated Show resolved Hide resolved
ggml/src/ggml-rpc/ggml-rpc.cpp Outdated Show resolved Hide resolved
ggml/src/ggml-rpc/ggml-rpc.cpp Outdated Show resolved Hide resolved
ggml/src/ggml-rpc/ggml-rpc.cpp Outdated Show resolved Hide resolved
ggml/src/ggml-rpc/ggml-rpc.cpp Outdated Show resolved Hide resolved
ggml/src/ggml-rpc/ggml-rpc.cpp Outdated Show resolved Hide resolved
@slaren slaren merged commit f922a9c into ggerganov:master Jan 4, 2025
48 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants