Apollo-variant-typing

Interpretation of variants identified in fungal genomes.

Pipeline information

Author(s): Boas van der Putten, Roxanne Wolthuis
Organization: Rijksinstituut voor Volksgezondheid en Milieu (RIVM)
Department: Infektieziekteonderzoek, Diagnostiek en Laboratorium Surveillance (IDS), Bacteriologie (BPD)
Start date: 07 - 04 - 2023
Commissioned by: Thijs Bosch, Julia (Jianhua) Zhang

About this project

Apollo-variant-typing uses output generated with apollo-mapping to type Candida auris and Aspergillus fumigatus genomes.

Candida auris typing currently includes:

Extraction of AMR mutations, based on reference data present in files/cauris/resistance_list.csv.
Prediction of C. auris clade using auriclass.

Aspergillus fumigatus typing will be implemented in the near future.

The apollo-variant-typing pipeline is created with the juno-template and juno-library.

The pipeline uses the following tools:

bcftools: predict the effects of genomic variants using the csq subcommand.
GATK: comprehensive toolkit for genomic variant analysis. Currently, only the VariantsToTable tool is used to convert VCF to tab-separated values.
Picard: comprehensive toolkit for NGS data handling. Currently, only the SamToFastq tool is used to extract FastQ data from a BAM file.
AuriClass: to predict Candida auris clade from FastQ data.

Prerequisities

Linux environment
mamba
Python 3.11

Installation

Clone the repository and enter the cloned directory

git clone https://github.com/RIVM-bioinformatics/apollo-variant-typing.git
cd apollo-variant-typing

Create & activate mamba environment.

mamba env update -f envs/apollo_variant_typing.yaml

Create & activate apollo environment.

conda activate apollo_variant_typing

Example of run:

python apollo_variant_typing.py -i [input] -o [output] -s [species]

Parameters & Usage

usage: apollo_variant_typing.py [-h] -i DIR [-o DIR] [-w DIR] [-ex FILE] [-p PATH] [-l] [-tl INT] [-u] [-n] [-q QUEUE] [--no-containers] [--snakemake-args [SNAKEMAKE_ARGS ...]] [-m FILE]
                                [-s GENUS SPECIES] [-d DIR] [--presets-path PATH]

Apollo-variant-typing for interpretation of variants identified in fungal genomes.

options:
  -h, --help            show this help message and exit
  -i DIR, --input DIR   Relative or absolute path to the input directory. It must contain all the raw reads (fastq) files for all samples to be processed (not in subfolders).
  -o DIR, --output DIR  Relative or absolute path to the output directory. If none is given, an 'output' directory will be created in the current directory.
  -w DIR, --workdir DIR
                        Relative or absolute path to the working directory. If none is given, the current directory is used.
  -ex FILE, --exclusionfile FILE
                        Path to the file that contains samplenames to be excluded.
  -p PATH, --prefix PATH
                        Conda or singularity prefix. Basically a path to the place where you want to store the conda environments or the singularity images.
  -l, --local           If this flag is present, the pipeline will be run locally (not attempting to send the jobs to an HPC cluster**). The default is to assume that you are working on a cluster.
                        **Note that currently only LSF clusters are supported.
  -tl INT, --time-limit INT
                        Time limit per job in minutes (passed as -W argument to bsub). Jobs will be killed if not finished in this time.
  -u, --unlock          Unlock output directory (passed to snakemake).
  -n, --dryrun          Dry run printing steps to be taken in the pipeline without actually running it (passed to snakemake).
  -q QUEUE, --queue QUEUE
                        Name of the queue that the job will be submitted to if working on a cluster.
  --no-containers       Use conda environments instead of containers.
  --snakemake-args [SNAKEMAKE_ARGS ...]
                        Extra arguments to be passed to snakemake API (https://snakemake.readthedocs.io/en/stable/api_reference/snakemake.html).
  -m FILE, --metadata FILE
                        Relative or absolute path to the metadata csv file. If provided, it must contain at least one column named 'sample' with the name of the sample (same than file name but
                        removing the suffix _R1.fastq.gz), a column called 'genus' and a column called 'species'. The genus and species provided will be used to choose the serotyper and the MLST
                        schema(s).If a metadata file is provided, it will overwrite the --species argument for the samples present in the metadata file.
  -s GENUS SPECIES, --species GENUS SPECIES
                        Species name (any species in the metadata file will overwrite this argument). It should be given as two words (e.g. --species Candida auris)
  -d DIR, --db_dir DIR  Relative or absolute path to the directory that contains the databases for all the tools used in this pipeline or where they should be downloaded. Default is:
                        /mnt/db/apollo/variant-typing
  --presets-path PATH   Relative or absolute path to custom presets.yaml to use. If none is provided, the default (config/presets.yaml) is used.

Explanation of the output

cauris_typing (if C. auris was analysed): Files containing C. auris-specific typing results, such as AMR mutation reports and clade predictions.
audit_trail: Logs of conda, git and the pipeline, a sample sheet, the used parameters and a snakemake report.
prepared_files: Generated indices of files, necessary for typing analyses.
log: Log with output and error file from the cluster for each Snakemake rule/step that is performed.

Issues

The default confifuration of this pipeline only works on the RIVM cluster. Paths to reference data can be specified on the command line. Cluster integration is currently only implemented for IBM LSF (bsub). To run without submission of jobs to a cluster, specify --local on the command line.

License

This pipeline is licensed with a AGPL3 license. Detailed information can be found inside the 'LICENSE' file in this repository.

Contact

Contact person: IDS-Bioinformatics
Email: [email protected]

Contribution guidelines

Apollo pipelines use a feature branch workflow. To work on features, create a branch from the main branch to make changes to. This branch can be merged to the main branch via a pull request. Hotfixes for bugs can be committed to the main branch.

Please adhere to the conventional commits specification for commit messages. These commit messages can be picked up by release please to create meaningful release messages.

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
.github/workflows		.github/workflows
base_juno_pipeline @ 3d05d8b		base_juno_pipeline @ 3d05d8b
config		config
envs		envs
files		files
tests		tests
workflow		workflow
.gitignore		.gitignore
.gitmodules		.gitmodules
CHANGELOG.md		CHANGELOG.md
README.md		README.md
Snakefile		Snakefile
apollo_variant_typing.py		apollo_variant_typing.py
mypy.ini		mypy.ini
run_pipeline.sh		run_pipeline.sh
version.py		version.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Apollo-variant-typing

Pipeline information

About this project

Prerequisities

Installation

Parameters & Usage

Explanation of the output

Issues

License

Contact

Contribution guidelines

About

Releases 7

Packages

Contributors 3

Languages

RIVM-bioinformatics/apollo-variant-typing

Folders and files

Latest commit

History

Repository files navigation

Apollo-variant-typing

Pipeline information

About this project

Prerequisities

Installation

Parameters & Usage

Explanation of the output

Issues

License

Contact

Contribution guidelines

About

Resources

Stars

Watchers

Forks

Releases 7

Packages 0

Contributors 3

Languages

Packages