Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There is currently no way to distribute GPUs among fireworks when running small jobs in parallel on one system.
An example: On NERSC, you get exclusive access to 1 Perlmutter nodes with 4 A100 GPUs. If you were to run 4 fireworks that require 1 GPU each, using
rlaunch multi 4
, each firework would be responsible for determining which GPUs to run on. Most python code will default to checking the CUDA_VISIBLE_DEVICES and either taking the first or all gpus resulting in an oversubscription leading to poor performance or an error.I don't believe this implementation would work for systems with non-NVIDIA/CUDA GPUs. I believe AMD devices require setting the
HIP_VISIBLE_DEVICES
variable, but I don't have access to any system with multiple AMD GPUs to test that.This might not be the best way to implement this, but it does raise a question about whether or not there is a need for a more general way to distribute non-CPU devices (GPU and TPU) among sub-jobs.