-
Notifications
You must be signed in to change notification settings - Fork 204
CBW 2024 Beginner Module 3: Microbiome statistics and visualizations
This tutorial is for the 2024 CBW bioinformatics workshop running from May 27th-30th. It is based on the Amplicon SOP v2 available on the Microbiome Helper repository and previous workshops designed by Diana Haider and Robert Beiko.
Authors: Monica Alvaro Fuss
This tutorial provides a walkthrough of an end-to-end pipeline using the command line interface for the analysis of high-throughput marker gene data. Commonly used marker genes for microbiome analysis include the 16S ribosomal RNA (rRNA) for prokaryotes, 18S rRNA for eukaryotes, and the internal transcribed spacer (ITS) for fungi. In this tutorial, we will explore a 16S rRNA dataset from wild blueberry (Vaccinium angustifolium) soil communities from both bulk and rhizosphere soils associated with either natural or managed habitats. You can read more about the study in these papers:
- Variation in Bacterial and Eukaryotic Communities Associated with Natural and Managed Wild Blueberry Habitats
- Metagenomic Functional Shifts to Plant Induced Environmental Changes
In Module 2 we covered the basics of marker gene analysis from raw reads to filtered feature table, and in Module 3 we will explore examples of downstream analyses that can be used to draw biological conclusions from these data. The pipeline described is embedded in the latest version of QIIME2 (Quantitative Insights into Microbial Ecology version 2023.2), which is a popular microbiome bioinformatics platform for microbial ecology built on user-made software packages called plugins that work on QIIME2 artifact or QZA files. Documentation for these plugins can be found in the QIIME 2 user documentation, along with tutorials and other useful information. QIIME2 also provides interpretable visualizations that can be accessed by opening any generated QZV files within QIIME2 View.
cd ~/workspace
mkdir bmb_module3
cd bmb_module3
cp -r ~/CourseData/MIC_data/BMB_data/deblur_output/ .
cp -r ~/CourseData/MIC_data/BMB_data/taxa .
cp ~/CourseData/MIC_data/BMB_data/Blueberry_metadata_reduced.tsv .
conda activate qiime2-amplicon-2024.2
conda activate qiime2-amplicon-backup
When you are finished this tutorial you can deactivate the conda environment using:
conda deactivate
-
If you run into problems copying the commands directly, try copying each line individually or typing them manually into the terminal.
-
Visualizing your data is always a good idea at any time during an analysis. In the interest of time, you will only be required to do so when you need to retrieve specific information from the visualization, but we have added some optional steps for you to explore if you wish.
-
Due to the time and memory constraints of this workshop, we will provide some of the necessary output files.
1. Build tree with SEPP QIIME 2 plugin
SEPP is one method for placing short sequences into a reference phylogenetic tree. This is a useful way of determining a phylogenetic tree for your ASVs. For 16S data you can do this with q2-fragment-insertion using the below command. Again, due to memory constraints, you can copy the output into your folder with the following command:
cp ~/CourseData/MIC_data/BMB_data/asvs-tree.qza .
cp ~/CourseData/MIC_data/BMB_data/insertion-placements.qza .
qiime fragment-insertion sepp --i-representative-sequences deblur_output/rep_seqs_final.qza --i-reference-database /home/shared/microbiome_amplicon/sepp-refs-gg-13-8.qza --o-tree asvs-tree.qza --o-placements insertion-placements.qza --p-threads 4
Note that if you do not already have this file locally you will need to download sepp-refs-gg-13-8.qza
as specified in the fragment-insertion instructions. You can specify custom reference files to place other amplicons, but the easiest approach for 18S and ITS data is to instead create a de novo tree as outlined in the Microbiome Helper repository.
A key quality control step is to plot rarefaction curves for all of your samples to determine if you performed sufficient sequencing. The below command will generate these plots (make sure you have the correct maximum sequencing depth as per your filtered feature table).
qiime diversity alpha-rarefaction --i-table deblur_output/deblur_table_final.qza --p-max-depth X --p-steps 20 --i-phylogeny asvs-tree.qza --o-visualization rarefaction_curves.qzv
Question 1: What is a good rarefaction depth for diversity analysis?
Common alpha and beta-diversity metrics can be calculated with a single command in QIIME 2. Ordination plots (such as PCoA plots for weighted UniFrac distances) will be generated automatically as well. This command will also rarefy all samples to the sample sequencing depth before calculating these metrics (X is a placeholder for the lowest reasonable sample depth; samples with depth below this cut-off will be excluded).
qiime diversity core-metrics-phylogenetic --i-table deblur_output/deblur_table_final.qza --i-phylogeny asvs-tree.qza --p-sampling-depth X --m-metadata-file Blueberry_metadata_reduced.tsv --p-n-jobs-or-threads 4 --output-dir diversity
For alpha diversity visualizations, you will need to produce boxplots comparing the different categories in your metadata file. For example, to create boxplots comparing the Shannon alpha-diversity metric you can use this command:
qiime diversity alpha-group-significance --i-alpha-diversity diversity/shannon_vector.qza --m-metadata-file Blueberry_metadata_reduced.tsv --o-visualization diversity/shannon_compare_groups.qzv
Hint: you will need to change this command for the other alpha diversity metrics. You can see the other metrics available by running ls diversity/*_vector.qza
Note that you can also export (see below) this or any other diversity metric file (ending in .qza) and analyze them with a different program.
Question 2: are there any significant differences in alpha diversity between any of our metadata categories?
Question 3: which metadata category appears to provide more separation in the beta diversity PCoA plots?
Another useful output is the interactive stacked bar-charts of the taxonomic abundances across samples, which can be output with this command:
qiime taxa barplot --i-table deblur_output/deblur_table_final.qza --i-taxonomy taxa/classification.qza --m-metadata-file Blueberry_metadata_reduced.tsv --o-visualization taxa/taxa_barplot.qzv
Question 4: can you identify any patterns between the metadata groups?
ANCOM is one method to test for differences in the relative abundance of features between sample groupings. It is a compositional approach that makes no assumptions about feature distributions. However, it requires that all features have non-zero abundances so a pseudocount first needs to be added (1 is a typical pseudocount choice):
qiime composition add-pseudocount --i-table deblur_output/deblur_table_final.qza --p-pseudocount 1 --o-composition-table deblur_output/deblur_table_final_pseudocount.qza
Then ANCOM can be run with this command; note that CATEGORY is a placeholder for the text label of your category of interest from the metadata file.
qiime composition ancom --i-table deblur_output/deblur_table_final_pseudocount.qza --m-metadata-file Blueberry_metadata_reduced.tsv --m-metadata-column CATEGORY --output-dir ancom_output
Question: If you run more than one description column, you won't be able to overwrite the output directory. How can you put a second file in this directory? (Hint: you can find additional information at QIIME 2 user documentation or by typing qiime composition ancom --help
)
Question 5: Does ANCOM identify any differentially abundant taxa between any of the metadata groups? If so, which one(s)?
Lastly, to get the BIOM file (with associated taxonomy) and FASTA file (one per ASV) for your dataset to plug into other programs you can use the commands below.
To export the FASTA:
qiime tools export --input-path deblur_output/rep_seqs_final.qza --output-path deblur_output_exported
To export the BIOM table (with taxonomy added as metadata):
qiime tools export --input-path deblur_output/deblur_table_final.qza --output-path deblur_output_exported
qiime tools export --input-path taxa/classification.qza --output-path taxa
sed -i -e '1 s/Feature/#Feature/' -e '1 s/Taxon/taxonomy/' taxa/taxonomy.tsv
biom add-metadata -i deblur_output_exported/feature-table.biom -o deblur_output_exported/feature-table_w_tax.biom --observation-metadata-fp taxa/taxonomy.tsv --sc-separated taxonomy
biom convert -i deblur_output_exported/feature-table_w_tax.biom -o deblur_output_exported/feature-table_w_tax.txt --to-tsv --header-key taxonomy
- Please feel free to post a question on the Microbiome Helper google group if you have any issues.
- General comments or inquires about Microbiome Helper can be sent to [email protected].