Snaq is a snakemake pipeline for Microbiome data analsysis using QIIME2.
This pipeline works in Linux, Mac and Windows (Ubuntu on Windows). It also can run using Docker container system.
Mohsen A, Chen Y-A, Allendes Osorio RS, Higuchi C and Mizuguchi K (2022) Snaq: A Dynamic Snakemake Pipeline for Microbiome Data Analysis With QIIME2. Front. Bioinform. 2:893933. doi: 10.3389/fbinf.2022.893933
-
Install Ubuntu for windows 10 following the instructions in this website
-
In "Ubuntu bash command line" install and test the pipeline following the same steps mentioned in Linux and Mac.
All these steps should be executed in the terminal (linux and Mac) or (Ubuntu bash command line) in windows,
-
Install mamba:
conda install -n base -c conda-forge mamba
conda activate base
mamba create -c conda-forge -c bioconda -n snakemake snakemake
conda activate snakemake
download the latest release file from this repository and extract it in a new folder, or clone this repository:
- Install docker
- Using Terminal, command prompt, or windows PowerShell depending on your system, clone docker image for snakemake by sending this command:
docker pull snakemake/snakemake
-
Download the latest release Source code (zip) from github repository, and extract it, or clone this repository.
-
check the integrity of the pipeline by sending this command:
docker run --rm -it -v "$PWD":/snaq -w /snaq snakemake/snakemake snakemake -lt
-
Create a new folder inside data folder use only letters in capital to name it, no spaces. Eg: AB, CONTROL, COHORTONE etc.
-
Copy your paired-end fastq files to the folder, check the identifier of the R1 and R2 is it is _R1_ and _R2_ or _1 and _2 then Snaq will understand and differentate the R1 and R2 files. If any other identifiers are used, please prepare a manifest file and copy it to results// folder.
Snaq will follow that manifest file if you provide it. Keep a copy of that manifest file somewhere outside the pipeline folder, because it could be overwritten by mistake. -
You need to send the snakemake command with basically two needed parameters
--cores <number of cores> --use-conda
, These two parameters are essential to run the analysis. -
After these two parameters you type the analysis target. For example to import the data to QIIME2, an artifact will be created with .qza extension. To do that for a cohort names "AB" the target should be
results/AB/AB.qza
. Snakemake will understand to import the data set saved indata/AB
folder toresults/AB/AB.qza
artifact file; That will be done in two steps, first a manifest file is created,result/AB/AB_manifest.qza
and then the files listed in that manifest files will be imported toresults/AB/AB.qza
.
-
Docker creates a container depending on an image, The image can be created or downloaded. The command
docker pull snakemake/snakemake
will download the required image to run sanakemake. -
When
docker run -it snakemake/snakemake
is executed, a container will be built which basically means running a small virtual linux PC inside your host system, and whatever command you send after that will run inside that virtual PC. -
For example if you send command
docker run -it snakemake/snakemake snakemake
then the snakemake software will run inside that created container. -
As soon as you stop running the docker container it resets back to original status, whatever modification you make are not permenant.
-
If you want changes to remain, you can link a folder from host machine to the container, and the container will save, modify, read from that folder, for that we use
-v
parameter in docker command. To map the pipeline folder to a folder inside the container you can:docker run -it -v c:\snaq:/snaq -w /snaq snakemake/snakemake
-
This command will start a container using sankemake image, and linke c:\snaq folder in the windows host PC to /work folder inside the container, and it make the working directory /snaq. So what ever command you send will run inside /snaq folder, any change done will be permenant in that folder.
To run a basic task in docker then the the command should be like this
docker run --rm -it -v [snaq folder in host system]:/snaq -w /snaq snakemake/snakemake snakemake --use-conda --cores 10 results/AB/AB+bb-t16+fp-f17-r21+dd+cls-silva+rrf10000.zip
is available for testing: can be downloaded from here; download it and extract it in data folder.
Possible targets are:
AB_manifest.tsv
AB.qza
AB+bb-t16+fp-f17-r21+dd+cls-gg_asv.biom
AB+bb-t16+fp-f17-r21+dd+cls-gg+phyloseq.RDS
AB+bb-t16+fp-f17-r21+dd+cls-gg+rrf-d10000+beta_braycurtis.tsv
AB+bb-t16+fp-f17-r21+dd+cls-gg+rrf-d10000+beta_jaccard.tsv
AB+bb-t16+fp-f17-r21+dd+cls-gg+rrf-d10000+manta_tax.tsv
AB+bb-t16+fp-f17-r21+dd+cls-gg+rrf-d10000+manta.tsv
AB+bb-t16+fp-f17-r21+dd+cls-gg+rrf-d10000+otu_tax.biom
AB+bb-t16+fp-f17-r21+dd+cls-gg+rrf-d10000+otu_tax_biom.tsv
AB+bb-t16+fp-f17-r21+dd+cls-gg+rrf-d10000+otu_tax.qza
AB+bb-t16+fp-f17-r21+dd+cls-gg+rrf-d10000.zip
AB+bb-t16+fp-f17-r21+dd+cls-gg_taxonomy.csv
AB+bb-t16+fp-f17-r21+dd+cls-gg_taxonomy.qza
AB+bb-t16+fp-f17-r21+dd+fasttree.nwk
AB+bb-t16+fp-f17-r21+dd+fasttree_rooted.qza
AB+bb-t16+fp-f17-r21+dd+rrf-d10000+alphadiversity.tsv
AB+bb-t16+fp-f17-r21+dd+rrf-d10000+beta_unweightedunifrac.csv
AB+bb-t16+fp-f17-r21+dd+rrf-d10000+beta_unweightedunifrac.qza
AB+bb-t16+fp-f17-r21+dd+rrf-d10000+beta_weightedunifrac.csv
AB+bb-t16+fp-f17-r21+dd+rrf-d10000+beta_weightedunifrac.qza
AB+bb-t16+fp-f17-r21+dd_seq.csv
AB+bb-t16+fp-f17-r21+dd_seq.qza
AB+bb-t16+fp-f17-r21+dd_stats.qza
AB+bb-t16+fp-f17-r21+dd_table.qza
AB+bb-t16+fp-f17-r21+dd_table+rrf-d10000.csv
AB+bb-t16+fp-f17-r21+dd_table+rrf-d10000.qza
AB+bb-t16+fp-f17-r21.qza
For more details please check the paper.