Skip to content

MetaBolt: Lightning fast & automated metagenomic pipeline powered by Nextflow

License

Notifications You must be signed in to change notification settings

muneebdev7/metabolt

Repository files navigation

MDL/metabolt

GitHub Actions CI Status GitHub Actions Linting StatusCite with Zenodo nf-test

Nextflow run with conda run with docker run with singularity Launch on Seqera Platform

[Lisence MIT]

Introduction

MDL/metabolt is a bioinformatics pipeline that ...

  1. Read QC (FastQC)

  2. Preprocessing (fastp)

  3. Assembly (MEGAHIT)

  4. Alignment (BWA)

    • Indexing Generates index files from reference genomes to expedite the alignment process.

    • Mapping Aligns sequencing reads to the indexed reference genome.

  5. SAMtools (SAMtools) Provides utilities for processing and managing SAM/BAM files.

    • Sorting Organizes alignments by genomic coordinates to facilitate efficient data retrieval.

    • Indexing Creates index files for sorted BAM files, enabling rapid access to specific genomic regions.

  6. Contigs Depth Calculation (jgi_summarize_bam_contig_depth)

  7. Binning (MetaBAT2)

  8. Present QC for Raw Reads (MultiQC)

Usage

Note

If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.

Minimum Steps to Execute the Pipeline

  1. Samplesheet Preparation:

    • Prepare a samplesheet with your input data. Each row represents a sample, with columns specifying the sample name and the paths to the FASTQ files.

    • Example samplesheet.csv (for paired-end reads):

      sample,fastq_1,fastq_2
      CONTROL,AEG588A1_S1_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz
      CONDITION,SRR123_S1_R1_011.fastq.gz,SRR123_S1_R2_011.fastq.gz
      

    Each row represents a FASTQ file (single-end) or a pair of FASTQ files (paired-end).

  2. Run the pipeline:

    nextflow run muneebdev7/metabolt \
      -profile <docker/singularity/conda/institute> \
      --input samplesheet.csv \
      --outdir <OUTDIR>
    

Warning

Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.

Pipeline output

To see the results of an example test run with a full size dataset refer to the results directory. For more details about the output files and reports, please refer to the output documentation.

Credits

MDL/metabolt was written by Muhammad Muneeb Nasir at Metagenomics Discovery Lab (MDL) at SINES, NUST.

We thank the following people for their extensive assistance in the development of this pipeline:

Dr. Masood Ur Rehman Kayani

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

For further information or help, don't hesitate to get in touch on email.

Citations

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license.

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.