X-SHiELD simulation post-processing on Stellar

This repository defines a snakemake workflow for post-processing data from four two-plus-year C3072 resolution simulations completed on Princeton's Stellar computer.

Installing `snakemake` and other dependencies

snakemake requires many dependencies, so trying to build an environment with plain conda does not always work. Per the snakemake documentation, it is therefore recommended to install mamba, which has a more advanced dependency solver. Once mamba is installed, one can create the post-processing environment using:

$ mamba env create --file envs/environment.yaml

Processing the data

To process the data, activate the environment from a screen session, and call the included top-level submit.sh bash script. This script handles some initial data preparation / organization, as well as partitioning the work into a sequence of batch jobs, grouping tasks into single jobs where appropriate to prevent overwhelming the SLURM scheduler with many small jobs.

$ screen
$ conda activate 2023-09-18-X-SHiELD-snakemake
$ bash submit.sh

High-level overview of the workflow

This workflow is geared to produce data compatible with AI2's corrective machine learning workflow. At a high level it does the following:

Combines the subtiles of the raw diagnostic and restart netCDF files output from the simulation into cohesive tiles using GFDL's mppnccombine tool, since even the coarse data was output with a 2x2 I/O layout to ease I/O overhead in the simulations.
Concatenates the diagnostics datasets along the time and tile dimensions, and coarsens the partially coarsened C384 output to C48 resolution before dumping out to zarr.
Coarsens each set of C384 restart files into its own subdirectory labeled with the timestamp of the form "%Y%m%d.%H%M%S" using AI2's default pressure-level coarse-graining strategy, and specialized coarsening of land surface fields. This arrangement and naming of the restart files is exactly what is required for performing an fv3net nudged run.

In total this workflow processes over 140 TB of restart files and 7.6 TB of diagnostics (we have ignored the 3D diagnostics for now, though they too could be processed by this workflow). By way of coarsening to C48 resolution, this data is reduced by a factor of 64 to a more manageable ~2 TB.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
config		config
envs		envs
misc		misc
rules		rules
scripts		scripts
LICENSE		LICENSE
README.md		README.md
Snakefile		Snakefile
submit.sh		submit.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

X-SHiELD simulation post-processing on Stellar

Installing `snakemake` and other dependencies

Processing the data

High-level overview of the workflow

About

Releases

Packages

Languages

License

ai2cm/PIRE-post-processing-worfklow

Folders and files

Latest commit

History

Repository files navigation

X-SHiELD simulation post-processing on Stellar

Installing snakemake and other dependencies

Processing the data

High-level overview of the workflow

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Installing `snakemake` and other dependencies

Packages