Biological background to project:
- RNAseq data generated from a Trypanosoma congolense RNAi experiment in the IL3000 laboratory strain of T. congolense that knocks down a specific gene. As the gene encodes an enzyme that is critical for energy metabolism
- Hypothesis: loss of function of this enzyme might illuminate other pathways that the parasite might activate to circumvent loss of the "default" pathway.
- Experimental set up:
- samples taken at T=0h, T=24h and T=48h
- some of the samples are un-induced, while others have been treated with tetra-cycline to induce gene expression from the RNAi construct
- 3 different sample types:
- Wild type (WT) cultures are Trypanosoma congolense samples without the RNAi construct; there are three replicates of most conditions.
- Clone1 is a Trypanosoma congolense cell line that has the RNAi construct in it; there are three replicates of each condition.
- Clone2 is another Trypanosoma congolense cell line that has the RNAi construct in it; there are three replicates of each condition.
The pipeline contains the overall design components/modules:
- perform a quality check on the paired-end raw sequence data using the fastqc
- assess the numbers and quality of the raw sequence data based on the output of fastqc
- align the read pairs to the Trypanosoma congolense genome using bowtie2, converting the output to indexed "bam" format with samtools
- generate counts data (assumption is that all genes have no introns)
- generate plain text tab-delimited output files that give the statistical mean (average) of the counts per gene (i.e. expression levels) for each group; as the gene names are pretty uninformative to a biologist, the gene descriptions (provided in the bed file) should also be included.
- use the mean expression levels to generate "fold change" data for the "group-wise" comparisons (no statistical testing)