Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use torchcompat to work on other devices #384

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Delaunay
Copy link

@Delaunay Delaunay commented May 21, 2024

The idea is to replace all mention to torchcompat.core which mirror torch.cuda for many devices (cuda, XPU, HPU)

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 21, 2024
@Delaunay
Copy link
Author

Delaunay commented May 21, 2024

Seems some primitive are not implemented for HPUs

dlrm.0 AttributeError: module 'torch._C' has no attribute '_broadcast_coalesced'
dlrm.0 [stderr] Traceback (most recent call last):
dlrm.0 [stderr]   File "/home/sdp/results/venv/torch/bin/voir", line 8, in <module>
dlrm.0 [stderr]     sys.exit(main())
dlrm.0 [stderr]   File "/home/sdp/voir/voir/cli.py", line 124, in main
dlrm.0 [stderr]     ov(sys.argv[1:] if argv is None else argv)
dlrm.0 [stderr]   File "/home/sdp/voir/voir/phase.py", line 331, in __call__
dlrm.0 [stderr]     self._run(*args, **kwargs)
dlrm.0 [stderr]   File "/home/sdp/voir/voir/overseer.py", line 242, in _run
dlrm.0 [stderr]     set_value(func())
dlrm.0 [stderr]   File "/home/sdp/voir/voir/scriptutils.py", line 37, in <lambda>
dlrm.0 [stderr]     return lambda: exec(mainsection, glb, glb)
dlrm.0 [stderr]   File "/home/sdp/milabench/benchmarks/dlrm/dlrm/dlrm_s_pytorch.py", line 1911, in <module>
dlrm.0 [stderr]     run()
dlrm.0 [stderr]   File "/home/sdp/milabench/benchmarks/dlrm/dlrm/dlrm_s_pytorch.py", line 1579, in run
dlrm.0 [stderr]     Z = dlrm_wrap(
dlrm.0 [stderr]   File "/home/sdp/milabench/benchmarks/dlrm/dlrm/dlrm_s_pytorch.py", line 146, in dlrm_wrap
dlrm.0 [stderr]     return dlrm(X.to(device), lS_o, lS_i)
dlrm.0 [stderr]   File "/home/sdp/results/venv/torch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1514, in 
_wrapped_call_impl
dlrm.0 [stderr]     return self._call_impl(*args, **kwargs)
dlrm.0 [stderr]   File "/home/sdp/results/venv/torch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1523, in _call_impl
dlrm.0 [stderr]     return forward_call(*args, **kwargs)
dlrm.0 [stderr]   File "/home/sdp/milabench/benchmarks/dlrm/dlrm/dlrm_s_pytorch.py", line 530, in forward
dlrm.0 [stderr]     return self.parallel_forward(dense_x, lS_o, lS_i)
dlrm.0 [stderr]   File "/home/sdp/milabench/benchmarks/dlrm/dlrm/dlrm_s_pytorch.py", line 631, in parallel_forward
dlrm.0 [stderr]     self.bot_l_replicas = replicate(self.bot_l, device_ids)
dlrm.0 [stderr]   File "/home/sdp/results/venv/torch/lib/python3.10/site-packages/torch/nn/parallel/replicate.py", line 110, in replicate
dlrm.0 [stderr]     param_copies = _broadcast_coalesced_reshape(params, devices, detach)
dlrm.0 [stderr]   File "/home/sdp/results/venv/torch/lib/python3.10/site-packages/torch/nn/parallel/replicate.py", line 83, in 
_broadcast_coalesced_reshape
dlrm.0 [stderr]     tensor_copies = Broadcast.apply(devices, *tensors)
dlrm.0 [stderr]   File "/home/sdp/results/venv/torch/lib/python3.10/site-packages/torch/autograd/function.py", line 553, in apply
dlrm.0 [stderr]     return super().apply(*args, **kwargs)  # type: ignore[misc]
dlrm.0 [stderr]   File "/home/sdp/results/venv/torch/lib/python3.10/site-packages/torch/nn/parallel/_functions.py", line 23, in forward
dlrm.0 [stderr]     outputs = comm.broadcast_coalesced(inputs, ctx.target_gpus)
dlrm.0 [stderr]   File "/home/sdp/results/venv/torch/lib/python3.10/site-packages/torch/nn/parallel/comm.py", line 57, in 
broadcast_coalesced
dlrm.0 [stderr]     return torch._C._broadcast_coalesced(tensors, devices, buffer_size)
dlrm.0 [stderr] AttributeError: module 'torch._C' has no attribute '_broadcast_coalesced'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants