Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-gpu fails on SLURM clusters #146

Open
ahmadrezarm opened this issue Dec 20, 2023 · 1 comment
Open

Multi-gpu fails on SLURM clusters #146

ahmadrezarm opened this issue Dec 20, 2023 · 1 comment

Comments

@ahmadrezarm
Copy link

Hi. I tried to run the training on a SLURM cluster with multiple GPUs. The problem is when you run the code on clusters like this, the cluster scheduler decides which GPUs to assign to you. In the current code, it is required to pass the names of the GPUs as an argument, which does not work in the scenario I mentioned.
I edited the code to just accept whatever GPU is available, regardless of their names. I think it would be nicer to have it this way. If interested, I can send you the quick fix.

@WuJunde
Copy link
Collaborator

WuJunde commented Dec 20, 2023

More than welcome to pull request

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants