-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
8802d63
commit d8e5005
Showing
11 changed files
with
348 additions
and
110 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,22 @@ | ||
# General access rules | ||
|
||
The cluster has a large number of nodes and file servers. The "ecco" nodes, available for authorized users ("ecco01" group) under "Restricted" are free to users. Other nodes can be added and used, should more (and possibly faster resources be needed). | ||
|
||
> It is also possible to have BioHPC host your own dedicated node, as some faculty members in the field do. Contact them or Lars to discuss. | ||
:::{tip} | ||
|
||
It is also possible to have BioHPC host your own dedicated node, as some faculty members in the field do. Contact them or Lars to discuss. | ||
|
||
::: | ||
|
||
This section describes how to obtain BioHPC resources: computing and storage. | ||
|
||
## Eligibility | ||
|
||
All econ-affiliated graduate students and faculty have access. "Contributing" faculty (faculty who have contributed money or compute resources) can reserve nodes for longer periods of time. | ||
|
||
## Requesting an account | ||
|
||
[Request an account](https://biohpc.cornell.edu/NewUserRequest.aspx,), ask to be associated with the "ecco01" group. That gives you free computing resources on the "Ecco cluster". | ||
|
||
Faculty who have contributed to the cluster have their own groups. Students and collaborators can request (or be provided with) access to those privileged resources by requesting that they be added to the specific group. For instance, Lars' collaborators would request to be added to the `ecco_lv39` group, and would then have access to the compute and storage allocations that Lars has paid for. | ||
|
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
|
||
|
||
# Quick start | ||
|
||
## Command line | ||
|
||
You need command line access to submit. You do not need a [reservation](reserving) to access a command line. | ||
|
||
## Submitting jobs | ||
|
||
You can submit from the command line (SSH) at the login nodes `cbsulogin?.biohpc.cornell.edu` (see [access description](https://biohpc.cornell.edu/lab/userguide.aspx?a=access#A3). All commands (`sbatch, squeue, sinfo`, etc) have to be run with option `--cluster eccoslurm`, otherwise they will apply to a different SLURM cluster (the one at BSCB). | ||
|
||
:::{admonition} TIP | ||
:class: tip | ||
|
||
Run the following line, logout, then back in, and henceforth you can skip the `--cluster eccoslurm` option: | ||
|
||
```bash | ||
echo 'export SLURM_CLUSTERS="eccoslurm"' >> $HOME/.bash_profile | ||
echo [email protected] >> $HOME/.forward | ||
``` | ||
|
||
(replace your `netid` in the second command). | ||
|
||
::: | ||
|
||
There is only one partition (queue) containing all nodes, default parameters (changeable through SLURM options at submission, see below) are: | ||
|
||
- 1 core and 4 GB RAM per job | ||
- infinite run time. | ||
|
||
## Interactive shell | ||
|
||
Interactive shell can be requested with command | ||
|
||
```bash | ||
srun --cluster eccoslurm --pty bash -l | ||
``` | ||
|
||
or if you ran the above TIP: | ||
|
||
```bash | ||
srun --pty bash -l | ||
``` | ||
|
||
|
||
## To see running jobs | ||
|
||
``` | ||
squeue | ||
``` | ||
|
||
## To cancel a running job | ||
|
||
Use | ||
|
||
``` | ||
scancel (ID) | ||
``` | ||
|
||
where the ID can be gleaned from the `squeue` command. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -11,26 +11,13 @@ kernelspec: | |
--- | ||
|
||
(slurm)= | ||
# Job scheduler (experimental) | ||
# Job scheduler on BioHPC | ||
|
||
A SLURM cluster `eccoslurm` is ready on nodes `cbsueccosl[01,03-04]` and typically also `cbsuecco03` and `cbsuecco04` (the latter are "borrowed", and might not always be available). There are between 48 and 144 "slots" (cpus) available for compute jobs. | ||
A SLURM cluster `eccoslurm` is ready on several nodes (some dedicated to the SLURM scheduler, others "borrowed"; the latter might not always be available). There are between 48 and 144 "slots" (cpus) available for compute jobs. | ||
|
||
## Who can use | ||
|
||
Everybody in the ECCO group can submit jobs. | ||
|
||
## Why | ||
|
||
|
||
If you only need one CPU, the easiest and fastest is to use the SLURM job scheduler. It can accomodate up to 100 simultaneous (normal-sized) jobs. You are guaranteed at least 1 CPU (or as many as requested), but no more. (This is new, and not yet fully tested.) | ||
|
||
## Available resources | ||
|
||
See [https://biohpc.cornell.edu/lab/ecco.htm](https://biohpc.cornell.edu/lab/ecco.htm) for the general overview, and the table below for some more specifics. | ||
Some of the nodes are a bit old (they will run about as fast as CCSS-RS, but much slower than a recent desktop), but have a ton of memory, and lots of CPUs. For instance, cbsuecco02 has 1024GB of memory. | ||
|
||
```{code-cell} ipython3 | ||
:tags: ["remove-input"] | ||
:tags: ["remove-input","full-width"] | ||
from IPython.display import HTML | ||
import pandas as pd | ||
# from jupyter_datatables import init_datatables_mode, render_datatable | ||
|
@@ -72,25 +59,14 @@ project_root = find_project_root() | |
nodes = pd.read_csv(os.path.join(project_root,"_data", "ecconodes.csv")) | ||
summary_table = nodes[["allocation", "cores", "CPUs","RAM","local storage in TB"]] | ||
summary = summary_table.groupby('allocation')[['cores', 'RAM',"local storage in TB"]].sum().reset_index() | ||
# Convert the summary DataFrame to an HTML table | ||
sumtable = summary.to_html(index=False, classes='table table-striped table-bordered table-sm') | ||
HTML(sumtable) | ||
``` | ||
|
||
**Details:** | ||
|
||
|
||
```{code-cell} ipython3 | ||
:tags: ["remove-input","full-width"] | ||
# limit to flex and slurm nodes | ||
nodes = nodes[nodes['allocation'].str.contains('flex|slurm',na=False)] | ||
# compute total cores as cores * CPUs | ||
nodes['cores'] = nodes['cores per CPU'] * nodes['CPUs'] | ||
# reorder columns | ||
columns = nodes.columns.tolist() # Get the list of column names | ||
columns.remove('model') # Remove column from the list | ||
columns.remove('cpu benchmark (system)') # Remove column from the list | ||
columns.append('model') # Append column to the end of the list | ||
# override the order of columns - this may need to be adjusted if the column names change | ||
columns = ['Nodename', 'allocation', 'cpu benchmark (single thread)', 'cores','RAM', 'local storage in TB', 'model','cores per CPU', 'CPUs', 'vintage' ] | ||
# Reorder the columns | ||
nodes = nodes[columns] | ||
|
@@ -102,62 +78,7 @@ show(nodes, lengthMenu=[15, 25, 50], layout={"topStart": "search"}, classes="dis | |
``` | ||
|
||
- *local* disk space refers to the `/workdir` temporary workspace. All nodes have access to the shared home directory. | ||
|
||
## Detailed info | ||
|
||
Detailed instructions on how to use a cluster are provided at [https://biohpc.cornell.edu/lab/cbsubscb_SLURM.htm](https://biohpc.cornell.edu/lab/cbsubscb_SLURM.htm) and the [official SLURM documentation](https://slurm.schedmd.com/documentation.html) ([useful cheatsheet on commands (PDF)](https://slurm.schedmd.com/pdfs/summary.pdf)). | ||
|
||
## Quick start | ||
|
||
You can submit from the command line (SSH) at the login nodes `cbsulogin?.biohpc.cornell.edu` (see [access description](https://biohpc.cornell.edu/lab/userguide.aspx?a=access#A3). All commands (`sbatch, squeue, sinfo`, etc) have to be run with option `--cluster eccoslurm`, otherwise they will apply to a different SLURM cluster (the one at BSCB). | ||
|
||
:::{admonition} TIP | ||
:class: tip | ||
|
||
Run the following line, logout, then back in, and henceforth you can skip the `--cluster eccoslurm` option: | ||
|
||
```bash | ||
echo 'export SLURM_CLUSTERS="eccoslurm"' >> $HOME/.bash_profile | ||
echo [email protected] >> $HOME/.forward | ||
``` | ||
|
||
(replace your `netid` in the second command). | ||
|
||
::: | ||
|
||
There is only one partition (queue) containing all nodes, default parameters (changeable through SLURM options at submission, see below) are: | ||
|
||
- 1 core and 4 GB RAM per job | ||
- infinite run time. | ||
|
||
## Interactive shell | ||
|
||
Interactive shell can be requested with command | ||
|
||
```bash | ||
srun --cluster eccoslurm --pty bash -l | ||
``` | ||
|
||
or if you ran the above TIP: | ||
|
||
```bash | ||
srun --pty bash -l | ||
``` | ||
|
||
|
||
## To see running jobs | ||
|
||
``` | ||
squeue | ||
``` | ||
|
||
## To cancel a running job | ||
|
||
Use | ||
|
||
``` | ||
scancel (ID) | ||
``` | ||
## Who can use | ||
|
||
where the ID can be gleaned from the `squeue` command. | ||
Everybody in the ECCO group can submit jobs. |
Oops, something went wrong.