Skip to content

Commit

Permalink
Merge pull request #845 from Premas/fix-array-job
Browse files Browse the repository at this point in the history
Add section Throttling in Slurm Array Job
  • Loading branch information
wwarriner authored Dec 3, 2024
2 parents 77efcc3 + fab0d82 commit 95384f3
Show file tree
Hide file tree
Showing 3 changed files with 26 additions and 3 deletions.
11 changes: 8 additions & 3 deletions docs/cheaha/slurm/practical_sbatch.md
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,11 @@ simulate $SEED $INPUT_FILE $OUTPUT_FILE

The `main` shell script will determine the upper bound `$N` of the `--array` flag, and then call `sbatch --array=1-$N job.sh`. It will be up to `job.sh` to determine how to use `$SLURM_ARRAY_TASK_ID`. Before we go too much further, it may be helpful to think of `sbatch --array=1-$N job.sh` as creating an indexed loop, from 1 to `$N`, and running `job.sh` on each of those index values. The important point is that the loop indices are run in parallel, so whatever happens in each call to `job.sh` must be independent. The `main.sh` file is the same for all languages and is shown in the code block below. The comments describe what each segment of code is doing.

<!-- markdownlint-disable MD046 -->
!!! important
To effectively manage resource usage, it's essential to implement [throttling](./submitting_jobs.md#throttling-in-slurm-array-jobs) by limiting the number of concurrent jobs that can run at the same time. This helps prevent the overloading of computing resources. For example, you can limit the number of simultaneously running array jobs to 4 with the percent `%` symbol in your submission command: `sbatch --array=1-$N%4 job.sh`.
<!-- markdownlint-enable MD046 -->

```bash title="main.sh"
#! /bin/bash

Expand All @@ -167,7 +172,7 @@ input_files=(../inputs/**/dice.csv)
FILE_COUNT=${#input_files[@]}
FILE_COUNT=$(( $FILE_COUNT - 1 ))

sbatch --array=0-$FILE_COUNT job.sh
sbatch --array=0-$FILE_COUNT%4 job.sh
```

1. The line `#! /bin/bash` instructs the operating system what interpreter to use if called without an explicit interpreter, like `./main.sh`. It is best practice to have this line for scripts running in `bash`. Other lines are possible for other interpreters.
Expand All @@ -192,7 +197,7 @@ sbatch --array=0-$FILE_COUNT job.sh
Double parentheses with a leading dollar sign like `$((...))` are used for evaluating integer arithmetic to a variable.
<!-- markdownlint-enable MD046 -->

1. The line `sbatch --array=0-$FILE_COUNT job.sh` puts the array tasks in the Slurm queue using the `job.sh` script. The number of tasks runs from `0` to `$FILE_COUNT` as compute above.
1. The line `sbatch --array=0-$FILE_COUNT%4 job.sh` puts the array tasks in the Slurm queue using the `job.sh` script. The array of tasks runs from 0 to $FILE_COUNT as determined earlier, where %4 limits the number of simultaneous tasks to 4.

To use the script, enter the command `bash main.sh` at the terminal.

Expand Down Expand Up @@ -262,5 +267,5 @@ input_files=(../inputs/**/dice.csv)
FILE_COUNT=${#input_files[@]}
FILE_COUNT=$(( $FILE_COUNT - 1 ))

sbatch --array=0-$FILE_COUNT job.sh
sbatch --array=0-$FILE_COUNT%4 job.sh
```
6 changes: 6 additions & 0 deletions docs/cheaha/slurm/slurm_tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -241,6 +241,12 @@ Array jobs are more effective when you have a larger number of similar tasks to
The following Slurm script is an example of how you might convert the previous `multijob` script to an array job. To start, copy the below script to a file named, `slurm_array.job`. The script requires the input file `python_script_new.py` and the `conda` environment `pytools-env`, similar to those used in [example2](../slurm/slurm_tutorial.md#example-2-sequential-job) and [example 3](../slurm/slurm_tutorial.md#example-3-parallel-jobs). Line 11 specifies the script as an array job, treating each task within the array as an independent job. For each task, lines 18-19 calculates the input range. `SLURM_ARRAY_TASK_ID` identifies the task executed using indexes, and is automatically set for array jobs. The python script (line 22) runs individual array task concurrently on respective input range. The command `awk` is used to prepend each output line with the unique task identifier and then append the results to the file, `output_all_tasks.txt`. For more details on on parameters of array jobs, please refer to [Batch Array Jobs](../slurm/submitting_jobs.md#batch-array-jobs-with-known-indices) and [Practical Batch Array Jobs](../slurm/practical_sbatch.md#).
<!-- markdownlint-disable MD046 -->
!!! important
For large array jobs, implementing [throttling](./submitting_jobs.md#throttling-in-slurm-array-jobs) helps control the number of concurrent jobs, preventing resource contention across the Cheaha cluster. Running too many jobs at once can cause competition for CPU, memory, or I/O, which may negatively impact performance.
<!-- markdownlint-enable MD046 -->
```bash linenums="1"
#!/bin/bash
#SBATCH --job-name=slurm_array ### Name of the job
Expand Down
12 changes: 12 additions & 0 deletions docs/cheaha/slurm/submitting_jobs.md
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,18 @@ For more details on using `sbatch` please see the [official documentation](https
If you are using bash or shell arrays, it is crucial to note they use 0-based indexing. Plan your `--array` flag indices accordingly.
<!-- markdownlint-enable MD046 -->

#### Throttling in Slurm Array Jobs

Throttling in Slurm array jobs refers to limiting the number of concurrent jobs that can run simultaneously. This approach prevents the overloading of computing resources and ensures fair distribution of resources among users. From a performance perspective, throttling helps optimize overall job performance by reducing resource contention across the Cheaha cluster. When too many jobs run at the same time, they may compete for CPU, memory, or I/O, which can negatively impact performance. Please [contact us](../../index.md#how-to-contact-us) if your research needs exceed our capacity.

To limit the number of concurrent jobs in a SLURM array, you can use the `%` separator. Here’s how to use it in the above example:

```bash
sbatch --array=0-9%4 job.sh
```

In this example, only 4 jobs will run concurrently, regardless of the total number of jobs (10) in the array.

### Batch Array Jobs With Dynamic or Computed Indices

For a practical example with dynamic indices, please visit our [Practical `sbatch` Examples](practical_sbatch.md)
Expand Down

0 comments on commit 95384f3

Please sign in to comment.