RNA_4annotation.Snakemake

Pipeline used in the CNAG AATEAM to reconstruct transcripts from RNAseq and/or ONT transcriptome sequencing for genome annotation

Create the config file with bin/create_rna_config.py:

create_rna_config.py -h
usage: create_configuration_file [-h] [--configFile configFile]
                                 [--logs-dir logs_dir]
                                 [--stringtie-path stringtiePath]
                                 [--star-cpu starCores]
                                 [--minimap-cpu minimapCores]
                                 [--portcullis-cpu portcullisCores]
                                 [--TACO-all-opts TACO_opts] [--genome genome]
                                 [--illumina-dir ILLUMINA_DIR]
                                 [--cdna-dir CDNA_DIR] [--drna-dir DRNA_DIR]
                                 [--bams bams [bams ...]]
                                 [--gene_models gene_models [gene_models ...]]
                                 [--gtf-models models] [--TACO-dir TACO_dir]
                                 [--junctions junctions]
                                 [--genome-dir genome_dir]
                                 [--star-dir star_dir]
                                 [--genomeChrBinNbits nbit] [--no-pe]
                                 [--stringtie-illum-opts stringtie_illumina_opts]
                                 [--TACO-illum-opts TACO_illumina_opts]
                                 [--cdna-mappings cdna_minimap_dir]
                                 [--stringtie-cdna-opts stringtie_cDNA_opts]
                                 [--TACO-cdna-opts TACO_cDNA_opts]
                                 [--drna-mappings drna_minimap_dir]
                                 [--stringtie-drna-opts stringtie_dRNA_opts]
                                 [--TACO-drna-opts TACO_dRNA_opts]
                                 [--illumina-reads illumina_fastqs]
                                 [--cdna-reads cDNA_fastqs]
                                 [--drna-reads dRNA_fastqs]

Create a configuration json file for the repeat annotation pipeline.

optional arguments:
  -h, --help            show this help message and exit

General Parameters:
  --configFile configFile
                        Configuration file with the pipeline parameters to be
                        created. Default RNAseq.config
  --logs-dir logs_dir   Directory to keep all the log files. Default logs
  --stringtie-path stringtiePath
                        Path to the stringtie executable. Default /scratch/pro
                        ject/devel/aateam/src/Stringtie2/stringtie-2.1.4/strin
                        gtie
  --star-cpu starCores  Number of threads to run star. Default 4
  --minimap-cpu minimapCores
                        Number of threads to run Minimap2. Default 4
  --portcullis-cpu portcullisCores
                        Number of threads to run portcullis. Default 4
  --TACO-all-opts TACO_opts
                        Options to run TACO when merging all the datasets.
                        Default --isoform-frac 0 --filter-min-expr 0

Inputs:
  --genome genome       Path to the fasta genome. Default None
  --illumina-dir ILLUMINA_DIR
                        Directory where the illumina fastqs are stored.
                        Default None
  --cdna-dir CDNA_DIR   Directory where the cDNA fastqs are stored. Default
                        None
  --drna-dir DRNA_DIR   Directory where the dRNA fastqs are stored. Default
                        None
  --bams bams [bams ...]
                        bam files to get the junctions from them all, do not
                        give this option if they are going to be generated by
                        the pipeline. Default None
  --gene_models gene_models [gene_models ...]
                        gtf models that are going to be combined in the last
                        step of this pipeline. Do not give this option if they
                        are going to be generated by the pipeline. Default
                        None

Outputs:
  --gtf-models models   Path to the final stringtie gtf. Default
                        TACO_assembled.gtf
  --TACO-dir TACO_dir   Directory to tun TACO. Default TACO_output
  --junctions junctions
                        Path to the final junctions file. Default portcullis_o
                        ut/3-filt/portcullis_filtered.pass.junctions.intron.gf
                        f3

Illumina:
  --genome-dir genome_dir
                        Directory for the genome index. Default genome
  --star-dir star_dir   Directory for the mapping step index. Default star
  --genomeChrBinNbits nbit
                        genomeChrBinNbits parameter of STAR. Default 18
  --no-pe               If specified, the input is not paired-end.
  --stringtie-illum-opts stringtie_illumina_opts
                        Options to run stringtie in illumina mappings. Default
  --TACO-illum-opts TACO_illumina_opts
                        Options to run TACO in illumina mappings. Default

cDNA:
  --cdna-mappings cdna_minimap_dir
                        Directory for the cDNA Minimap2 mappings. Default cDNA
  --stringtie-cdna-opts stringtie_cDNA_opts
                        Options to run stringtie in cDNA mappings. Default
                        --conservative -R
  --TACO-cdna-opts TACO_cDNA_opts
                        Options to run TACO in cDNA mappings. Default
                        --isoform-frac 0.01

dRNA:
  --drna-mappings drna_minimap_dir
                        Directory for the dRNA Minimap2 mappings. Default dRNA
  --stringtie-drna-opts stringtie_dRNA_opts
                        Options to run stringtie in dRNA mappings. Default
  --TACO-drna-opts TACO_dRNA_opts
                        Options to run TACO in dRNA mappings. Default
                        --isoform-frac 0.01 --filter-min-expr 0.2

Wildcards:
  --illumina-reads illumina_fastqs
                        List with basename of the illumina fastqs. Default
                        None
  --cdna-reads cDNA_fastqs
                        List with basename of the cDNA fastqs. Default None
  --drna-reads dRNA_fastqs
                        List with basename of the dRNA fastqs. Default None

Personalize the specification file for your run
Launch the Snakemake pipeline:

snakemake --notemp -j 999 --snakefile  RNA_4annotation.Snakemake/bin/RNA_4annot.smk  --is --cluster "python3 /project/devel/aateam/src/Snakemake-CNAG/sbatch-cnag.py {dependencies}" --configfile RNAseq.config  --cluster-config RNA_pipeline.spec -np

np is needed to make a dry-run, remove it when you're ready to launch the pipeline

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
bin		bin
envs		envs
modules		modules
README.md		README.md
RNA_pipeline.spec		RNA_pipeline.spec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RNA_4annotation.Snakemake

About

Releases

Packages

Languages

cnag-aat/RNA_4annotation.Snakemake

Folders and files

Latest commit

History

Repository files navigation

RNA_4annotation.Snakemake

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages