Pipeline used in the CNAG AATEAM to reconstruct transcripts from RNAseq and/or ONT transcriptome sequencing for genome annotation
- Create the config file with bin/create_rna_config.py:
create_rna_config.py -h
usage: create_configuration_file [-h] [--configFile configFile]
[--logs-dir logs_dir]
[--stringtie-path stringtiePath]
[--star-cpu starCores]
[--minimap-cpu minimapCores]
[--portcullis-cpu portcullisCores]
[--TACO-all-opts TACO_opts] [--genome genome]
[--illumina-dir ILLUMINA_DIR]
[--cdna-dir CDNA_DIR] [--drna-dir DRNA_DIR]
[--bams bams [bams ...]]
[--gene_models gene_models [gene_models ...]]
[--gtf-models models] [--TACO-dir TACO_dir]
[--junctions junctions]
[--genome-dir genome_dir]
[--star-dir star_dir]
[--genomeChrBinNbits nbit] [--no-pe]
[--stringtie-illum-opts stringtie_illumina_opts]
[--TACO-illum-opts TACO_illumina_opts]
[--cdna-mappings cdna_minimap_dir]
[--stringtie-cdna-opts stringtie_cDNA_opts]
[--TACO-cdna-opts TACO_cDNA_opts]
[--drna-mappings drna_minimap_dir]
[--stringtie-drna-opts stringtie_dRNA_opts]
[--TACO-drna-opts TACO_dRNA_opts]
[--illumina-reads illumina_fastqs]
[--cdna-reads cDNA_fastqs]
[--drna-reads dRNA_fastqs]
Create a configuration json file for the repeat annotation pipeline.
optional arguments:
-h, --help show this help message and exit
General Parameters:
--configFile configFile
Configuration file with the pipeline parameters to be
created. Default RNAseq.config
--logs-dir logs_dir Directory to keep all the log files. Default logs
--stringtie-path stringtiePath
Path to the stringtie executable. Default /scratch/pro
ject/devel/aateam/src/Stringtie2/stringtie-2.1.4/strin
gtie
--star-cpu starCores Number of threads to run star. Default 4
--minimap-cpu minimapCores
Number of threads to run Minimap2. Default 4
--portcullis-cpu portcullisCores
Number of threads to run portcullis. Default 4
--TACO-all-opts TACO_opts
Options to run TACO when merging all the datasets.
Default --isoform-frac 0 --filter-min-expr 0
Inputs:
--genome genome Path to the fasta genome. Default None
--illumina-dir ILLUMINA_DIR
Directory where the illumina fastqs are stored.
Default None
--cdna-dir CDNA_DIR Directory where the cDNA fastqs are stored. Default
None
--drna-dir DRNA_DIR Directory where the dRNA fastqs are stored. Default
None
--bams bams [bams ...]
bam files to get the junctions from them all, do not
give this option if they are going to be generated by
the pipeline. Default None
--gene_models gene_models [gene_models ...]
gtf models that are going to be combined in the last
step of this pipeline. Do not give this option if they
are going to be generated by the pipeline. Default
None
Outputs:
--gtf-models models Path to the final stringtie gtf. Default
TACO_assembled.gtf
--TACO-dir TACO_dir Directory to tun TACO. Default TACO_output
--junctions junctions
Path to the final junctions file. Default portcullis_o
ut/3-filt/portcullis_filtered.pass.junctions.intron.gf
f3
Illumina:
--genome-dir genome_dir
Directory for the genome index. Default genome
--star-dir star_dir Directory for the mapping step index. Default star
--genomeChrBinNbits nbit
genomeChrBinNbits parameter of STAR. Default 18
--no-pe If specified, the input is not paired-end.
--stringtie-illum-opts stringtie_illumina_opts
Options to run stringtie in illumina mappings. Default
--TACO-illum-opts TACO_illumina_opts
Options to run TACO in illumina mappings. Default
cDNA:
--cdna-mappings cdna_minimap_dir
Directory for the cDNA Minimap2 mappings. Default cDNA
--stringtie-cdna-opts stringtie_cDNA_opts
Options to run stringtie in cDNA mappings. Default
--conservative -R
--TACO-cdna-opts TACO_cDNA_opts
Options to run TACO in cDNA mappings. Default
--isoform-frac 0.01
dRNA:
--drna-mappings drna_minimap_dir
Directory for the dRNA Minimap2 mappings. Default dRNA
--stringtie-drna-opts stringtie_dRNA_opts
Options to run stringtie in dRNA mappings. Default
--TACO-drna-opts TACO_dRNA_opts
Options to run TACO in dRNA mappings. Default
--isoform-frac 0.01 --filter-min-expr 0.2
Wildcards:
--illumina-reads illumina_fastqs
List with basename of the illumina fastqs. Default
None
--cdna-reads cDNA_fastqs
List with basename of the cDNA fastqs. Default None
--drna-reads dRNA_fastqs
List with basename of the dRNA fastqs. Default None
- Personalize the specification file for your run
- Launch the Snakemake pipeline:
snakemake --notemp -j 999 --snakefile RNA_4annotation.Snakemake/bin/RNA_4annot.smk --is --cluster "python3 /project/devel/aateam/src/Snakemake-CNAG/sbatch-cnag.py {dependencies}" --configfile RNAseq.config --cluster-config RNA_pipeline.spec -np
np is needed to make a dry-run, remove it when you're ready to launch the pipeline