Updating documentation

labordynamicsinstitute · Apr 10, 2024 · d8e5005 · d8e5005
1 parent 8802d63
commit d8e5005
Show file tree

Hide file tree

Showing 11 changed files with 348 additions and 110 deletions.
diff --git a/_data/ecconodes.csv b/_data/ecconodes.csv
@@ -1,4 +1,4 @@
-Nodename,allocation,model,cpu benchmark (single thread),cpu benchmark (system),cores,CPUs,RAM,vintage,local storage in TB
+Nodename,allocation,model,cpu benchmark (single thread),cpu benchmark (system),cores per CPU,CPUs,RAM,vintage,local storage in TB
 cbsuecco01,reservation,Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz,1939,35085,16,2,256,2017,0.20
 cbsuecco02,reservation,Intel(R) Xeon(R) CPU E7- 8837 @ 2.67GHz,1284,21891,8,4,1024,2011,4.30
 cbsuecco03,flex,AMD Opteron(tm) Processor 6380,1091,13476,16,2,256,2011,1.40
@@ -7,6 +7,10 @@ cbsuecco05,flex,AMD Opteron(tm) Processor 6380,1091,13476,16,2,256,2011,1.40
 cbsuecco06,reservation,,,,32,2,256,,2.60
 cbsuecco07,reservation,Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz,1952,28971,14,2,128,2016,0.20
 cbsuecco08,reservation,Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz,1952,28971,14,2,128,2016,0.20
+cbsuecco09,flex,Intel(R) Xeon(R) Gold 5420+ @ 2.0-4.1GHz,3418,57593,28,2,256,2024,3.5
+cbsuecco10,flex,Intel(R) Xeon(R) Gold 5420+ @ 2.0-4.1GHz,3418,57593,28,2,256,2024,3.5
+cbsuecco11,flex,Intel(R) Xeon(R) Gold 5420+ @ 2.0-4.1GHz,3418,57593,28,2,256,2024,3.5
+cbsuecco12,flex,Intel(R) Xeon(R) Gold 5420+ @ 2.0-4.1GHz,3418,57593,28,2,256,2024,3.5
 cbsueccosl01,slurm,AMD Opteron(tm) Processor 6380,1091,6738,16,1,128,2011,6.90
 cbsueccosl02,,,,,,,,,
 cbsueccosl03,slurm,AMD Opteron(tm) Processor 6378,1127,11778,16,2,256,2012,3.30

diff --git a/_toc.yml b/_toc.yml
@@ -9,20 +9,17 @@ parts:
   - file: docs/overview
   - file: docs/windows
   - file: docs/linux
-- caption: BioHPC
+- caption: How to access BioHPC Linux
   chapters:
   - file: docs/access
-    sections:
-    - file: docs/biohpc/eligibility
-    - file: docs/biohpc/requesting
-- caption: Reserving Nodes
-  chapters:
   - file: docs/biohpc/reserving
   - file: docs/biohpc/slurm
     sections:
+    - file: docs/biohpc/slurm-quick-start
     - file: docs/biohpc/stata
     - file: docs/biohpc/julia
     - file: docs/biohpc/sbatch
+  - file: docs/biohpc/software
 - caption: Command reference
   chapters:
   - file: docs/commands/biohpcres

diff --git a/docs/access.md b/docs/access.md
@@ -1,6 +1,22 @@
+# General access rules
 
 The cluster has a large number of nodes and file servers. The "ecco" nodes, available for authorized users ("ecco01" group) under "Restricted" are free to users. Other nodes can be added and used, should more (and possibly faster resources be needed).
 
-> It is also possible to have BioHPC host your own dedicated node, as some faculty members in the field do. Contact them or Lars to discuss.
+:::{tip}
+
+It is also possible to have BioHPC host your own dedicated node, as some faculty members in the field do. Contact them or Lars to discuss.
+
+:::
 
 This section describes how to obtain BioHPC resources: computing and storage.
+
+## Eligibility
+
+All econ-affiliated graduate students and faculty have access. "Contributing" faculty (faculty who have contributed money or compute resources) can reserve nodes for longer periods of time.
+
+## Requesting an account
+
+[Request an account](https://biohpc.cornell.edu/NewUserRequest.aspx,), ask to be associated with the "ecco01" group. That gives you free computing resources on the "Ecco cluster".
+
+Faculty who have contributed to the cluster have their own groups. Students and collaborators can request (or be provided with) access to those privileged resources by requesting that they be added to the specific group. For instance, Lars' collaborators would request to be added to the `ecco_lv39` group, and would then have access to the compute and storage allocations that Lars has paid for.
+
diff --git a/docs/biohpc/eligibility.md b/docs/biohpc/eligibility.md
diff --git a/docs/biohpc/requesting.md b/docs/biohpc/requesting.md
diff --git a/docs/biohpc/reserving.md b/docs/biohpc/reserving.md
@@ -1,3 +1,4 @@
+(reserving)=
 # Requesting exclusive access to an entire node
 
 Once logged in to the BioHPC website, go to [https://biohpc.cornell.edu/lab/labres.aspx](https://biohpc.cornell.edu/lab/labres.aspx), choose `Restricted`. You can reserve any node, up to the time limit imposed by your group membership:
@@ -8,6 +9,26 @@ Once logged in to the BioHPC website, go to [https://biohpc.cornell.edu/lab/labr
 
 Typical nodes have between 16 and 32 CPUs, and between 128Gb and 1024Gb of RAM (memory). File storage varies substantially.
 
-> While you have reserved a node, nobody else can access it (unless you explicitly add them to a reservation). If you know you are only using a few CPUs, consider submitting individual jobs.
+:::{tip}
+
+You can view your current reservations on the [My Reservations](https://biohpc.cornell.edu/lab/labresman.aspx) page.
+
+:::
+
+
+:::{note}
+
+While you have reserved a node, nobody else can access it (unless you explicitly add them to a reservation). If you know you are only using a few CPUs, consider submitting individual jobs.
+
+:::
+
+## Adding users to a reservation
+
+On the [My Reservations](https://biohpc.cornell.edu/lab/labresman.aspx) page, scroll down where you can either
+
+- "Add user with labid" to a reservation from the pull-down menu.
+- "Link Group" to an existing reservation, where "Group" is a defined group of users, e.g., your "lab" (e.g., Lars's group would be `ecco_lv39`).
+
+## Manipulating reservations from the command line
 
 If using the command line login node, also see [`biohpc_res`](biohpcres) command.
diff --git a/docs/biohpc/slurm-quick-start.md b/docs/biohpc/slurm-quick-start.md
@@ -0,0 +1,61 @@
+
+
+# Quick start
+
+## Command line
+
+You need command line access to submit. You do not need a [reservation](reserving) to access a command line.
+
+## Submitting jobs
+
+You can submit from the command line (SSH) at the login nodes `cbsulogin?.biohpc.cornell.edu` (see [access description](https://biohpc.cornell.edu/lab/userguide.aspx?a=access#A3). All commands (`sbatch, squeue, sinfo`, etc) have to be run with option `--cluster eccoslurm`, otherwise they will apply to a different SLURM cluster (the one at BSCB).
+
+:::{admonition} TIP
+:class: tip
+
+Run the following line, logout, then back in, and henceforth you can skip the `--cluster eccoslurm` option:
+
+```bash
+echo 'export SLURM_CLUSTERS="eccoslurm"' >> $HOME/.bash_profile
+echo [email protected] >> $HOME/.forward
+``` 
+
+(replace your `netid` in the second command).
+
+:::
+
+There is only one partition (queue) containing all nodes, default parameters (changeable through SLURM options at submission, see below) are:
+
+- 1 core and 4 GB RAM per job 
+- infinite run time. 
+
+## Interactive shell
+
+Interactive shell can be requested  with command 
+
+```bash
+srun --cluster eccoslurm --pty bash -l
+```
+
+or if you ran the above TIP:
+
+```bash
+srun --pty bash -l
+```
+
+
+## To see running jobs
+
+```
+squeue
+```
+
+## To cancel a running job
+
+Use
+
+```
+scancel (ID)
+```
+
+where the ID can be gleaned from the `squeue` command.
diff --git a/docs/biohpc/slurm.md b/docs/biohpc/slurm.md
@@ -11,26 +11,13 @@ kernelspec:
 ---
 
 (slurm)=
-# Job scheduler (experimental)
+# Job scheduler on BioHPC
 
-A SLURM cluster `eccoslurm` is ready on nodes `cbsueccosl[01,03-04]` and typically also `cbsuecco03` and `cbsuecco04` (the latter are "borrowed", and might not always be available). There are between 48 and 144 "slots" (cpus) available for compute jobs.
+A SLURM cluster `eccoslurm` is ready on several nodes  (some dedicated to the SLURM scheduler, others "borrowed"; the latter might not always be available). There are between 48 and 144 "slots" (cpus) available for compute jobs.
 
-## Who can use
-
-Everybody in the ECCO group can submit jobs.
-
-## Why
-
-
-If you only need one CPU, the easiest and fastest is to use the SLURM job scheduler. It can accomodate up to 100 simultaneous (normal-sized) jobs. You are guaranteed at least 1 CPU (or as many as requested), but no more. (This is new, and not yet fully tested.)
-
-## Available resources
-
-See [https://biohpc.cornell.edu/lab/ecco.htm](https://biohpc.cornell.edu/lab/ecco.htm) for the general overview, and the table below for some more specifics. 
-Some of the nodes are a bit old (they will run about as fast as CCSS-RS, but much slower than a recent desktop), but have a ton of memory, and lots of CPUs. For instance, cbsuecco02 has 1024GB of memory. 
 
 ```{code-cell} ipython3
-:tags: ["remove-input"]
+:tags: ["remove-input","full-width"]
 from IPython.display import HTML
 import pandas as pd
 # from jupyter_datatables import init_datatables_mode, render_datatable
@@ -72,25 +59,14 @@ project_root = find_project_root()
 
 
 nodes = pd.read_csv(os.path.join(project_root,"_data", "ecconodes.csv"))
-summary_table = nodes[["allocation", "cores", "CPUs","RAM","local storage in TB"]]
-summary = summary_table.groupby('allocation')[['cores', 'RAM',"local storage in TB"]].sum().reset_index()
-
-# Convert the summary DataFrame to an HTML table
-sumtable = summary.to_html(index=False, classes='table table-striped table-bordered table-sm')
-HTML(sumtable)
-```
-
-**Details:**
-
-
-```{code-cell} ipython3
-:tags: ["remove-input","full-width"]
+# limit to flex and slurm nodes
+nodes = nodes[nodes['allocation'].str.contains('flex|slurm',na=False)]
+# compute total cores as cores * CPUs
+nodes['cores'] = nodes['cores per CPU'] * nodes['CPUs']
 
 # reorder columns
-columns = nodes.columns.tolist()  # Get the list of column names
-columns.remove('model')  # Remove column from the list
-columns.remove('cpu benchmark (system)')  # Remove column from the list
-columns.append('model')  # Append  column to the end of the list
+# override the order of columns - this may need to be adjusted if the column names change
+columns = ['Nodename', 'allocation', 'cpu benchmark (single thread)', 'cores','RAM',  'local storage in TB', 'model','cores per CPU', 'CPUs', 'vintage' ]
 
 # Reorder the columns
 nodes = nodes[columns]
@@ -102,62 +78,7 @@ show(nodes, lengthMenu=[15, 25, 50], layout={"topStart": "search"}, classes="dis
 
 ```
 
-- *local* disk space refers to the `/workdir` temporary workspace. All nodes have access to the shared home directory.
-
-## Detailed info
-
-Detailed instructions on how to use a cluster are provided at [https://biohpc.cornell.edu/lab/cbsubscb_SLURM.htm](https://biohpc.cornell.edu/lab/cbsubscb_SLURM.htm) and the [official SLURM documentation](https://slurm.schedmd.com/documentation.html) ([useful cheatsheet on commands (PDF)](https://slurm.schedmd.com/pdfs/summary.pdf)).
-
-## Quick start
-
-You can submit from the command line (SSH) at the login nodes `cbsulogin?.biohpc.cornell.edu` (see [access description](https://biohpc.cornell.edu/lab/userguide.aspx?a=access#A3). All commands (`sbatch, squeue, sinfo`, etc) have to be run with option `--cluster eccoslurm`, otherwise they will apply to a different SLURM cluster (the one at BSCB).
-
-:::{admonition} TIP
-:class: tip
-
-Run the following line, logout, then back in, and henceforth you can skip the `--cluster eccoslurm` option:
-
-```bash
-echo 'export SLURM_CLUSTERS="eccoslurm"' >> $HOME/.bash_profile
-echo [email protected] >> $HOME/.forward
-``` 
-
-(replace your `netid` in the second command).
-
-:::
-
-There is only one partition (queue) containing all nodes, default parameters (changeable through SLURM options at submission, see below) are:
-
-- 1 core and 4 GB RAM per job 
-- infinite run time. 
-
-## Interactive shell
-
-Interactive shell can be requested  with command 
-
-```bash
-srun --cluster eccoslurm --pty bash -l
-```
-
-or if you ran the above TIP:
-
-```bash
-srun --pty bash -l
-```
-
-
-## To see running jobs
-
-```
-squeue
-```
-
-## To cancel a running job
 
-Use
-
-```
-scancel (ID)
-```
+## Who can use
 
-where the ID can be gleaned from the `squeue` command.
+Everybody in the ECCO group can submit jobs.