Skip to content

Snitkin-Lab-Umich/Snakemake_setup

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 

Repository files navigation

Snakemake_setup

Setup information for snakemake pipelines

Written by Sophie Hoffman & Zena Lapp.

Useful links

Snakemake:

Benefits of snakemake

  • Your analysis is reproducible.
  • You don't have to re-perform computationally intensive tasks early in the pipeline to change downstream analyses or figures.
  • You can easily combine shell, R, python, etc. scritps into one pipeline.
  • You can easily share your pipeline with others.
  • You can submit a single slurm job and snakemake handles submitting the rest of your jobs for you.

Conda

Conda is useful because it allows you to run your snakemake pipeline in an environment with the pipeline's dependencies.

To download miniconda for linux if you don't already have it:

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-MacOSX-x86_64.sh

Creating a conda environment is easiest with a yaml file. An example of a yaml file called snakemake_env.yaml:

name: snakemake_environment #name of the environment
#include your pipeline dependencies here
channels:
  - conda-forge
  - bioconda
  - defaults
dependencies: 
  - snakemake=5.5.4
  - ariba=2.14.4

To create and activate the conda environment:

conda env create -f snakemake_env.yaml # you only have to do this once
conda activate snakemake_environment # you have to do this every time 

A few basics

Useful snakemake arguments:

  • snakemake -n dry-run (to test it out before running it)
  • snakemake runs the pipeline
  • snakemake --dag | dot -Tsvg > dag.svg creates a dag (this can be super difficult to read with large complex pipelines)

Running the snakemake pipeline on the cluster

The best way to run snakemake is to add your snakemake command to an .sbat file to submit as a job. For example this file called snakemake.sbat:

#!/bin/sh
# Job name
#SBATCH --job-name=snakemake
# User info
#SBATCH [email protected]
#SBATCH --mail-type=BEGIN,END,NONE,FAIL,REQUEUE
#SBATCH --export=ALL
#SBATCH --partition=standard
#SBATCH --account=esnitkin1
# Number of cores, amount of memory, and walltime
#SBATCH --nodes=1 --ntasks=1 --cpus-per-task=1 --mem=1g --time=10:00:00
#  Change to the directory you submitted from
cd $SLURM_SUBMIT_DIR
echo $SLURM_SUBMIT_DIR

# Load modules

# Job commands
snakemake --latency-wait 90 --profile config -s snakefile 

To run the snakemake pipeline on the cluster, you have to:

  • Modify your email address in snakemake.sbat
  • Modify your email in the config file

Then run:

conda activate snakemake_environment # if you're not already in the snakemake_environment conda environment
sbatch snakemake.sbat

You can check your job using:

squeue -u UNIQNAME

For a more detailed walkthrough of snakemake pipelines:

About

Setup information for snakemake pipelines

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published