Skip to content

CBW 2024 Advanced Module 1: Introduction to metagenomics and read‐based profiling

benfish404 edited this page Apr 25, 2024 · 33 revisions

This page will contain a tutorial

Bioinformatic Tool Citations

  • FastQC
  • Kneaddata
  • Bowtie2
  • Kraken2
  • Bracken
  • Kraken-biom
  • MetaPhlAn 3.1

First, make your desired output directory (if it doesn't already exist). Then, run FastQC as follows:

fastqc -t 4 raw_data/*fastq.gz -o fastqc_out

Run Kneaddata.

parallel -j 1 --eta --link 'kneaddata -i1 {1} -i2 {2} -o kneaddata_out --db cbwdata/CourseData/MIC_data/tools/bowtie2_db/GRCh38_PhiX --bypass-trim' ::: raw_data/*R1_subsampled.fastq.gz ::: raw_data/*R2_subsampled.fastq.gz

Concatenate the reads into a single file.

perl ../tools/concat_paired_end.pl -p 4 --no_R_match -o cat_reads kneaddata_out/*_paired_contam*.fastq

If the above does not work, you may need to install Perl:

conda install conda-forge::perl

If it still does not work or you already have Perl installed, you may get an error saying you require Parallel::ForkManager. Fix by executing the following inside your conda environment:

cpan Parallel::ForkManager

Run Kraken.

parallel -j 2 --eta 'kraken2 --db cbwdata/CourseData/MIC_data/tools/kraken2_standard_08gb --output kraken2_outraw/{/.}.kraken --report kraken2_kreport/{/.}.kreport' {} ::: cat_reads/*.fastq

Run Bracken.

parallel -j 2 --eta 'bracken -d cbwdata/CourseData/MIC_data/tools/kraken2_standard_08gb -i {} -o bracken_out{/.}.species.bracken -r 100 -l S -t 1' ::: kraken2_kreport/*.kreport

run kraken-biom:

python ../tools/kraken-biom.py --fmt json -o mgs.biom -m mgs_metadata.tsv $(ls kraken2_outraw/*bracken*.kreport)
Clone this wiki locally