Skip to content

Output files

luissian edited this page Nov 30, 2018 · 8 revisions

The scope of allele calling feature in taranis is to collect as much information as possible from the sample files and the schema.

For that reason, when taranis is executed, it will create a big number of files, grouped in 3 main different folders.

Allele calling folder structure

The alignments folder groups the matching alignments files. A matching alignment information is generated each time that blastn is executed for a core gene against the sample file and the result is neither exact match nor locus not found (LNF).

The file follow this convention to facilitate its identification; match_alignment_<core_gene_name>_<sample_name>_paired_assembly.txt. Each file contains the heading

Core Gene Sample Name Alignment Sequence

And 3 rows containing the alignment sequence of the sample, the schema sequence and the the row in between to identify if is a match "|" or space " " if there is a mismatch. An example of matching alignment is :

Core Gene Sample Name Alignment Sequence
lmo0359 RA-L2073 sample --C--GTAG--
lmo0359 RA-L2073 match --!--!--!--!--!---
lmo0359 RA-L2073 schema GCAGTAGG

Note that matching alignment file is a tabulate separated file, but "txt" extension has been set for the file to keep in the right position the matching/no matching characters.

The proteins folder groups the translate coding to protein files. The protein information is generated also, when the blastn is executed for a core gene against the sample file and the result is neither exact match nor locus not found (LNF).

The file follow this convention to facilitate its identification; protein_<core_gene_name>_<sample_name>_paired_assembly.txt. Each file contains the heading

Core Gene Sample Name Protein in Protein Sequence

And 3 rows containing the alignment sequence of the sample, the schema sequence and the the row in between to identify if is a match "|" or space " " if there is a mismatch. An example of matching alignment is :

Core Gene Sample Name Protein in Protein Sequence
lmo0359 RA-L2073 sample LTAVAIGTLAG
lmo0359 RA-L2073 match --------------!-----
lmo0359 RA-L2073 schema MLYTMKDLLA

The file has the "txt" extension to be opened by a text editor to keep the alignment matching.

The graphic folder will have statistics graphics about the allele calling.

Under the output folder the following directory structure will be created:

  • deletions.tsv
  • inferred_alleles.tsv
  • insertions.tsv
  • matching_contings.tsv
  • paralog.tsv
  • plot.tsv
  • result.tsv
  • snp.tsv
  • summary_result.tsv