Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initializing all ranks to the same value to avoid failure of UT AllR… #1459

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

mberenjk
Copy link
Contributor

…educe for FP8 type

Details

Do not mention proprietary info or link to internal work items in this PR.

Work item: "Internal", or link to GitHub issue (if applicable).

What were the changes?
Initializing all ranks to the same value to avoid failure of UT AllReduce for FP8 type

Why were the changes made?
The UT was failed before for FP8 types.

How was the outcome achieved?
Due to floating-point math not being commutative, the ordering in which ranks are added will matter, we need to initialize all the ranks to the same value to avoid it.

Additional Documentation:
What else should the reviewer know?

Approval Checklist

Do not approve until these items are satisfied.

  • Verify the CHANGELOG has been updated, if
    • there are any NCCL API version changes,
    • any changes impact library users, and/or
    • any changes impact any other ROCm library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants