Skip to content

Commit

Permalink
Updating documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
larsvilhuber committed Apr 10, 2024
1 parent 8802d63 commit d8e5005
Show file tree
Hide file tree
Showing 11 changed files with 348 additions and 110 deletions.
6 changes: 5 additions & 1 deletion _data/ecconodes.csv
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Nodename,allocation,model,cpu benchmark (single thread),cpu benchmark (system),cores,CPUs,RAM,vintage,local storage in TB
Nodename,allocation,model,cpu benchmark (single thread),cpu benchmark (system),cores per CPU,CPUs,RAM,vintage,local storage in TB
cbsuecco01,reservation,Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz,1939,35085,16,2,256,2017,0.20
cbsuecco02,reservation,Intel(R) Xeon(R) CPU E7- 8837 @ 2.67GHz,1284,21891,8,4,1024,2011,4.30
cbsuecco03,flex,AMD Opteron(tm) Processor 6380,1091,13476,16,2,256,2011,1.40
Expand All @@ -7,6 +7,10 @@ cbsuecco05,flex,AMD Opteron(tm) Processor 6380,1091,13476,16,2,256,2011,1.40
cbsuecco06,reservation,,,,32,2,256,,2.60
cbsuecco07,reservation,Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz,1952,28971,14,2,128,2016,0.20
cbsuecco08,reservation,Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz,1952,28971,14,2,128,2016,0.20
cbsuecco09,flex,Intel(R) Xeon(R) Gold 5420+ @ 2.0-4.1GHz,3418,57593,28,2,256,2024,3.5
cbsuecco10,flex,Intel(R) Xeon(R) Gold 5420+ @ 2.0-4.1GHz,3418,57593,28,2,256,2024,3.5
cbsuecco11,flex,Intel(R) Xeon(R) Gold 5420+ @ 2.0-4.1GHz,3418,57593,28,2,256,2024,3.5
cbsuecco12,flex,Intel(R) Xeon(R) Gold 5420+ @ 2.0-4.1GHz,3418,57593,28,2,256,2024,3.5
cbsueccosl01,slurm,AMD Opteron(tm) Processor 6380,1091,6738,16,1,128,2011,6.90
cbsueccosl02,,,,,,,,,
cbsueccosl03,slurm,AMD Opteron(tm) Processor 6378,1127,11778,16,2,256,2012,3.30
Expand Down
9 changes: 3 additions & 6 deletions _toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,20 +9,17 @@ parts:
- file: docs/overview
- file: docs/windows
- file: docs/linux
- caption: BioHPC
- caption: How to access BioHPC Linux
chapters:
- file: docs/access
sections:
- file: docs/biohpc/eligibility
- file: docs/biohpc/requesting
- caption: Reserving Nodes
chapters:
- file: docs/biohpc/reserving
- file: docs/biohpc/slurm
sections:
- file: docs/biohpc/slurm-quick-start
- file: docs/biohpc/stata
- file: docs/biohpc/julia
- file: docs/biohpc/sbatch
- file: docs/biohpc/software
- caption: Command reference
chapters:
- file: docs/commands/biohpcres
Expand Down
18 changes: 17 additions & 1 deletion docs/access.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,22 @@
# General access rules

The cluster has a large number of nodes and file servers. The "ecco" nodes, available for authorized users ("ecco01" group) under "Restricted" are free to users. Other nodes can be added and used, should more (and possibly faster resources be needed).

> It is also possible to have BioHPC host your own dedicated node, as some faculty members in the field do. Contact them or Lars to discuss.
:::{tip}

It is also possible to have BioHPC host your own dedicated node, as some faculty members in the field do. Contact them or Lars to discuss.

:::

This section describes how to obtain BioHPC resources: computing and storage.

## Eligibility

All econ-affiliated graduate students and faculty have access. "Contributing" faculty (faculty who have contributed money or compute resources) can reserve nodes for longer periods of time.

## Requesting an account

[Request an account](https://biohpc.cornell.edu/NewUserRequest.aspx,), ask to be associated with the "ecco01" group. That gives you free computing resources on the "Ecco cluster".

Faculty who have contributed to the cluster have their own groups. Students and collaborators can request (or be provided with) access to those privileged resources by requesting that they be added to the specific group. For instance, Lars' collaborators would request to be added to the `ecco_lv39` group, and would then have access to the compute and storage allocations that Lars has paid for.

3 changes: 0 additions & 3 deletions docs/biohpc/eligibility.md

This file was deleted.

5 changes: 0 additions & 5 deletions docs/biohpc/requesting.md

This file was deleted.

23 changes: 22 additions & 1 deletion docs/biohpc/reserving.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
(reserving)=
# Requesting exclusive access to an entire node

Once logged in to the BioHPC website, go to [https://biohpc.cornell.edu/lab/labres.aspx](https://biohpc.cornell.edu/lab/labres.aspx), choose `Restricted`. You can reserve any node, up to the time limit imposed by your group membership:
Expand All @@ -8,6 +9,26 @@ Once logged in to the BioHPC website, go to [https://biohpc.cornell.edu/lab/labr

Typical nodes have between 16 and 32 CPUs, and between 128Gb and 1024Gb of RAM (memory). File storage varies substantially.

> While you have reserved a node, nobody else can access it (unless you explicitly add them to a reservation). If you know you are only using a few CPUs, consider submitting individual jobs.
:::{tip}

You can view your current reservations on the [My Reservations](https://biohpc.cornell.edu/lab/labresman.aspx) page.

:::


:::{note}

While you have reserved a node, nobody else can access it (unless you explicitly add them to a reservation). If you know you are only using a few CPUs, consider submitting individual jobs.

:::

## Adding users to a reservation

On the [My Reservations](https://biohpc.cornell.edu/lab/labresman.aspx) page, scroll down where you can either

- "Add user with labid" to a reservation from the pull-down menu.
- "Link Group" to an existing reservation, where "Group" is a defined group of users, e.g., your "lab" (e.g., Lars's group would be `ecco_lv39`).

## Manipulating reservations from the command line

If using the command line login node, also see [`biohpc_res`](biohpcres) command.
61 changes: 61 additions & 0 deletions docs/biohpc/slurm-quick-start.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@


# Quick start

## Command line

You need command line access to submit. You do not need a [reservation](reserving) to access a command line.

## Submitting jobs

You can submit from the command line (SSH) at the login nodes `cbsulogin?.biohpc.cornell.edu` (see [access description](https://biohpc.cornell.edu/lab/userguide.aspx?a=access#A3). All commands (`sbatch, squeue, sinfo`, etc) have to be run with option `--cluster eccoslurm`, otherwise they will apply to a different SLURM cluster (the one at BSCB).

:::{admonition} TIP
:class: tip

Run the following line, logout, then back in, and henceforth you can skip the `--cluster eccoslurm` option:

```bash
echo 'export SLURM_CLUSTERS="eccoslurm"' >> $HOME/.bash_profile
echo [email protected] >> $HOME/.forward
```

(replace your `netid` in the second command).

:::

There is only one partition (queue) containing all nodes, default parameters (changeable through SLURM options at submission, see below) are:

- 1 core and 4 GB RAM per job
- infinite run time.

## Interactive shell

Interactive shell can be requested with command

```bash
srun --cluster eccoslurm --pty bash -l
```

or if you ran the above TIP:

```bash
srun --pty bash -l
```


## To see running jobs

```
squeue
```

## To cancel a running job

Use

```
scancel (ID)
```

where the ID can be gleaned from the `squeue` command.
101 changes: 11 additions & 90 deletions docs/biohpc/slurm.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,26 +11,13 @@ kernelspec:
---

(slurm)=
# Job scheduler (experimental)
# Job scheduler on BioHPC

A SLURM cluster `eccoslurm` is ready on nodes `cbsueccosl[01,03-04]` and typically also `cbsuecco03` and `cbsuecco04` (the latter are "borrowed", and might not always be available). There are between 48 and 144 "slots" (cpus) available for compute jobs.
A SLURM cluster `eccoslurm` is ready on several nodes (some dedicated to the SLURM scheduler, others "borrowed"; the latter might not always be available). There are between 48 and 144 "slots" (cpus) available for compute jobs.

## Who can use

Everybody in the ECCO group can submit jobs.

## Why


If you only need one CPU, the easiest and fastest is to use the SLURM job scheduler. It can accomodate up to 100 simultaneous (normal-sized) jobs. You are guaranteed at least 1 CPU (or as many as requested), but no more. (This is new, and not yet fully tested.)

## Available resources

See [https://biohpc.cornell.edu/lab/ecco.htm](https://biohpc.cornell.edu/lab/ecco.htm) for the general overview, and the table below for some more specifics.
Some of the nodes are a bit old (they will run about as fast as CCSS-RS, but much slower than a recent desktop), but have a ton of memory, and lots of CPUs. For instance, cbsuecco02 has 1024GB of memory.

```{code-cell} ipython3
:tags: ["remove-input"]
:tags: ["remove-input","full-width"]
from IPython.display import HTML
import pandas as pd
# from jupyter_datatables import init_datatables_mode, render_datatable
Expand Down Expand Up @@ -72,25 +59,14 @@ project_root = find_project_root()
nodes = pd.read_csv(os.path.join(project_root,"_data", "ecconodes.csv"))
summary_table = nodes[["allocation", "cores", "CPUs","RAM","local storage in TB"]]
summary = summary_table.groupby('allocation')[['cores', 'RAM',"local storage in TB"]].sum().reset_index()
# Convert the summary DataFrame to an HTML table
sumtable = summary.to_html(index=False, classes='table table-striped table-bordered table-sm')
HTML(sumtable)
```

**Details:**


```{code-cell} ipython3
:tags: ["remove-input","full-width"]
# limit to flex and slurm nodes
nodes = nodes[nodes['allocation'].str.contains('flex|slurm',na=False)]
# compute total cores as cores * CPUs
nodes['cores'] = nodes['cores per CPU'] * nodes['CPUs']
# reorder columns
columns = nodes.columns.tolist() # Get the list of column names
columns.remove('model') # Remove column from the list
columns.remove('cpu benchmark (system)') # Remove column from the list
columns.append('model') # Append column to the end of the list
# override the order of columns - this may need to be adjusted if the column names change
columns = ['Nodename', 'allocation', 'cpu benchmark (single thread)', 'cores','RAM', 'local storage in TB', 'model','cores per CPU', 'CPUs', 'vintage' ]
# Reorder the columns
nodes = nodes[columns]
Expand All @@ -102,62 +78,7 @@ show(nodes, lengthMenu=[15, 25, 50], layout={"topStart": "search"}, classes="dis
```

- *local* disk space refers to the `/workdir` temporary workspace. All nodes have access to the shared home directory.

## Detailed info

Detailed instructions on how to use a cluster are provided at [https://biohpc.cornell.edu/lab/cbsubscb_SLURM.htm](https://biohpc.cornell.edu/lab/cbsubscb_SLURM.htm) and the [official SLURM documentation](https://slurm.schedmd.com/documentation.html) ([useful cheatsheet on commands (PDF)](https://slurm.schedmd.com/pdfs/summary.pdf)).

## Quick start

You can submit from the command line (SSH) at the login nodes `cbsulogin?.biohpc.cornell.edu` (see [access description](https://biohpc.cornell.edu/lab/userguide.aspx?a=access#A3). All commands (`sbatch, squeue, sinfo`, etc) have to be run with option `--cluster eccoslurm`, otherwise they will apply to a different SLURM cluster (the one at BSCB).

:::{admonition} TIP
:class: tip

Run the following line, logout, then back in, and henceforth you can skip the `--cluster eccoslurm` option:

```bash
echo 'export SLURM_CLUSTERS="eccoslurm"' >> $HOME/.bash_profile
echo [email protected] >> $HOME/.forward
```

(replace your `netid` in the second command).

:::

There is only one partition (queue) containing all nodes, default parameters (changeable through SLURM options at submission, see below) are:

- 1 core and 4 GB RAM per job
- infinite run time.

## Interactive shell

Interactive shell can be requested with command

```bash
srun --cluster eccoslurm --pty bash -l
```

or if you ran the above TIP:

```bash
srun --pty bash -l
```


## To see running jobs

```
squeue
```

## To cancel a running job

Use

```
scancel (ID)
```
## Who can use

where the ID can be gleaned from the `squeue` command.
Everybody in the ECCO group can submit jobs.
Loading

0 comments on commit d8e5005

Please sign in to comment.