[core][compiled graphs] Meta-issue: Support collective communication ops #47983

stephanie-wang · 2024-10-10T22:59:51Z

aDAG currently does not support collective APIs. We would like to add support for collective APIs, starting from allreduce. This PR support allreduce by introducing a syntax sugar `ray.experimental.collective.allreduce.bind`. The `bind` accepts arguments `input_nodes`, `op`, and `transport`. It returns a list of `CollectiveOutputNode` as the allreduce results, with the same size of `input_nodes`. The allreduce results write to newly allocated tensors. In the `COMPUTE` operation of `CollectiveOutputNode`, the corresponding NCCL collective API is called. There are no required changes for the input and output channels of `CollectiveOutputNode`. Proposed new API: ```python import ray.experimental.collective as collective with InputNode() as inp: dag = [worker.return_tensor.bind(inp) for worker in workers] dag = collective.allreduce.bind(dag, ReduceOp.SUM) dag = MultiOutputNode(dag) ``` API Requirements: 1. Input nodes are unique. 2. Actor handles are unique. 3. Actor handles match the custom NCCL group if specified. 4. All tensors have the same shape. Requirements 1-3 are checked in the `_CollectiveOperation` constructor. Requirement 4 is checked by runtime timeout. The operation scheduling is also updated to consider the NCCL collective operation. When a NCCL collective node is selected, all the corresponding collective nodes in the collective group should be selected as well. Meta-issue: #47983 --------- Signed-off-by: Weixin Deng <[email protected]> Signed-off-by: Yuhan Ruan <[email protected]> Co-authored-by: Yuhan Ruan <[email protected]>

aDAG currently does not support collective APIs. We would like to add support for collective APIs, starting from allreduce. This PR support allreduce by introducing a syntax sugar `ray.experimental.collective.allreduce.bind`. The `bind` accepts arguments `input_nodes`, `op`, and `transport`. It returns a list of `CollectiveOutputNode` as the allreduce results, with the same size of `input_nodes`. The allreduce results write to newly allocated tensors. In the `COMPUTE` operation of `CollectiveOutputNode`, the corresponding NCCL collective API is called. There are no required changes for the input and output channels of `CollectiveOutputNode`. Proposed new API: ```python import ray.experimental.collective as collective with InputNode() as inp: dag = [worker.return_tensor.bind(inp) for worker in workers] dag = collective.allreduce.bind(dag, ReduceOp.SUM) dag = MultiOutputNode(dag) ``` API Requirements: 1. Input nodes are unique. 2. Actor handles are unique. 3. Actor handles match the custom NCCL group if specified. 4. All tensors have the same shape. Requirements 1-3 are checked in the `_CollectiveOperation` constructor. Requirement 4 is checked by runtime timeout. The operation scheduling is also updated to consider the NCCL collective operation. When a NCCL collective node is selected, all the corresponding collective nodes in the collective group should be selected as well. Meta-issue: ray-project#47983 --------- Signed-off-by: Weixin Deng <[email protected]> Signed-off-by: Yuhan Ruan <[email protected]> Co-authored-by: Yuhan Ruan <[email protected]>

aDAG currently does not support collective APIs. We would like to add support for collective APIs, starting from allreduce. This PR support allreduce by introducing a syntax sugar `ray.experimental.collective.allreduce.bind`. The `bind` accepts arguments `input_nodes`, `op`, and `transport`. It returns a list of `CollectiveOutputNode` as the allreduce results, with the same size of `input_nodes`. The allreduce results write to newly allocated tensors. In the `COMPUTE` operation of `CollectiveOutputNode`, the corresponding NCCL collective API is called. There are no required changes for the input and output channels of `CollectiveOutputNode`. Proposed new API: ```python import ray.experimental.collective as collective with InputNode() as inp: dag = [worker.return_tensor.bind(inp) for worker in workers] dag = collective.allreduce.bind(dag, ReduceOp.SUM) dag = MultiOutputNode(dag) ``` API Requirements: 1. Input nodes are unique. 2. Actor handles are unique. 3. Actor handles match the custom NCCL group if specified. 4. All tensors have the same shape. Requirements 1-3 are checked in the `_CollectiveOperation` constructor. Requirement 4 is checked by runtime timeout. The operation scheduling is also updated to consider the NCCL collective operation. When a NCCL collective node is selected, all the corresponding collective nodes in the collective group should be selected as well. Meta-issue: ray-project#47983 --------- Signed-off-by: Weixin Deng <[email protected]> Signed-off-by: Yuhan Ruan <[email protected]> Co-authored-by: Yuhan Ruan <[email protected]> Signed-off-by: mohitjain2504 <[email protected]>

jeffreyjeffreywang · 2024-12-13T18:27:23Z

Hey @stephanie-wang, just wanted to make sure this meta-task is still on the roadmap before I start the implementation. I'm particularly interested in #47938, but I think it's better to start with supporting all-to-all, all-to-one, and one-to-all collectives. I took a look at the RFC, and here is my understanding:

All-to-one: Pass the reader worker handle to the collective call

workers = [Worker.options(num_gpus=1).remote() for _ in range(3)]
nccl_group_handle = ray.collective.NcclGroup(workers)
with InputNode() as inp:
  results = [worker.fwd.bind(inp) for worker in workers]
  # Pass the worker handle to the collective call.
  dag = ray.collective.gather.bind(
    results, workers[0],
    transport=nccl_group_handle)
  dag = workers[0].sync.bind(dag)

# Errors if `gather` reader is not part of the group.
dag = dag.experimental_compile()

One-to-all: Pass the sender worker handle to the collective call

workers = [Worker.options(num_gpus=1).remote() for _ in range(3)]
nccl_group_handle = ray.collective.NcclGroup(workers)
# One-to-all pattern.
with InputNode() as inp:
  result = workers[0].fwd.bind(inp)
  results = ray.collective.broadcast.bind(
    result, workers,
    transport=nccl_group_handle)
  dag = MultiOutputNode(results)

# Errors if `broadcast` sender is not part of the group.
dag = dag.experimental_compile()

Is this the design that we agreed upon? If this looks good to you, I'll create issues to track the support for all-to-all, all-to-one, and one-to-all patterns.

stephanie-wang · 2024-12-18T03:09:52Z

Hey @stephanie-wang, just wanted to make sure this meta-task is still on the roadmap before I start the implementation. I'm particularly interested in #47938, but I think it's better to start with supporting all-to-all, all-to-one, and one-to-all collectives. I took a look at the RFC, and here is my understanding:
* All-to-one: Pass the reader worker handle to the collective call
workers = [Worker.options(num_gpus=1).remote() for _ in range(3)]
nccl_group_handle = ray.collective.NcclGroup(workers)
with InputNode() as inp:
  results = [worker.fwd.bind(inp) for worker in workers]
  # Pass the worker handle to the collective call.
  dag = ray.collective.gather.bind(
    results, workers[0],
    transport=nccl_group_handle)
  dag = workers[0].sync.bind(dag)

# Errors if `gather` reader is not part of the group.
dag = dag.experimental_compile()
* One-to-all: Pass the sender worker handle to the collective call
workers = [Worker.options(num_gpus=1).remote() for _ in range(3)]
nccl_group_handle = ray.collective.NcclGroup(workers)
# One-to-all pattern.
with InputNode() as inp:
  result = workers[0].fwd.bind(inp)
  results = ray.collective.broadcast.bind(
    result, workers,
    transport=nccl_group_handle)
  dag = MultiOutputNode(results)

# Errors if `broadcast` sender is not part of the group.
dag = dag.experimental_compile()
Is this the design that we agreed upon? If this looks good to you, I'll create issues to track the support for all-to-all, all-to-one, and one-to-all patterns.

Yes, everything here is still on the roadmap and open to contributions!

jeffreyjeffreywang · 2024-12-18T03:31:05Z

Thanks Stephanie, here are the links to the issues for all-to-one and one-to-all.

All-to-one: [core][compiled graph] Support all-to-one collective ops (e.g. gather) #49324
One-to-all: [core][compiled graph] Support one-to-all collective ops (e.g. broadcast) #49325

dengwxn mentioned this issue Oct 20, 2024

[aDAG] Support all reduce collective in aDAG #47621

Merged

8 tasks

This was referenced Dec 18, 2024

[core][compiled graph] Support all-to-one collective ops (e.g. gather) #49324

Open

[core][compiled graph] Support one-to-all collective ops (e.g. broadcast) #49325

Open

anyadontfly mentioned this issue Dec 23, 2024

[core][compiled graphs] Support reduce scatter collective in compiled graph #49404

Open

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[core][compiled graphs] Meta-issue: Support collective communication ops #47983

[core][compiled graphs] Meta-issue: Support collective communication ops #47983

stephanie-wang commented Oct 10, 2024 •

edited

Loading

jeffreyjeffreywang commented Dec 13, 2024

stephanie-wang commented Dec 18, 2024

jeffreyjeffreywang commented Dec 18, 2024 •

edited

Loading

[core][compiled graphs] Meta-issue: Support collective communication ops #47983

[core][compiled graphs] Meta-issue: Support collective communication ops #47983

Comments

stephanie-wang commented Oct 10, 2024 • edited Loading

Description

Use case

jeffreyjeffreywang commented Dec 13, 2024

stephanie-wang commented Dec 18, 2024

jeffreyjeffreywang commented Dec 18, 2024 • edited Loading

stephanie-wang commented Oct 10, 2024 •

edited

Loading

jeffreyjeffreywang commented Dec 18, 2024 •

edited

Loading