-
Notifications
You must be signed in to change notification settings - Fork 41
amrfinder_customize
amrfinder_customize
allows the addition of gene sequences and protein point mutations to the AMRFinderPlus database in a fairly consistent way. There are several limitations at this time, so if they impact you please let us know and we can work on fixing them.
You can only add protein genes, either to existing "families" or to new families. They will not be treated as "alleles" in our database which have a special handling where a family symbol is returned if the sequence match is not identical.
Right now we only have protein point mutations that can be added, either to existing genes for an existing "--organism
" or for new genes and/or organisms.
make test_customdb
This requires three files. One a FASTA file, and a tab-delimited metadata file that provides information about each of the genes, and a mutation metadata file. See below for formats.
Example input files are contained in the github directory custom.fa
, and custom.meta
, custom.pm
. (add links once in production)
amrfinder_customize --database_in data/latest --database_out data/customdb --prot custom.fa --metadata custom.meta
# Adding point mutations
This requires a FASTA file with the wildtype sequence and a tab-delimited metadata file for the point mutations. See below for formats.
The defline needs to contain two tab-delimited fields.
- The sequence accession (or other identifier, will appear in the AMRFinderPlus report and the point mutation metadata file if this is a wildtype sequence for a point mutation)
- The gene symbol (this will be printed in the AMRFinderPlus report for genes that you are adding to the database. This should appear in the gene metadata file for genes)
Example:
>aac6p_me_acc aac(6')-novel
MTNSNDSVTLRLMTEHDLAMLYEWLNRSHIVEWWGGEEARPTLADVQEQYLPSVLAQESVTPYIAMLNGE
PIGYAQSYVALGSGDGWWEEETDPGVRGIDQLLANASQLGKGLGTKLVRALVELLFNDPEVTKIQTDPSP
SNLRAIRCYEKAGFEKQGTVTTPDGPAVYMVQTRQAFERTRSVA
>mexR_acc mexR
MNYPVNPDLMPALMAVFQHVRTRIQSELDCQRLDLTPPDVHVLKLIDEQRGLNLQDLGRQMCRDKALITRKIRELEGRNLVRRERNPSDQRSFQLFLTDEGLAIHQHAEAIMSRVHDELFAPLTPVEQATLVHLLDQCLAAQPLEDI*
This is a tab-delimited file describing each of the genes you want to add to the AMRFinderPlus database with the following fields. Each gene symbol must have an entry in this file to be identified. Note that if you are just adding new gene sequences to an existing AMRFinderPlus gene/family you do not have to put a line in this file, though you should make sure the "gene symbol" in the FASTA file matches a fam_id
from the fam.tab
file.
-
gene_symbol
- The symbol that will be printed in the Gene symbol field of the AMRFinderPlus results -
reportable
- This determines the "Scope" of the added gene.1
for "Plus" genes and2
for "Core" -
type
- text that will go in thetype
field of the AMRFinderPlus report for this gene -
subtype
- text that will go in thesubtype
field of the AMRFinderPlus report for this gene -
class
- text that will go in theclass
field of the AMRFinderPlus report for this gene -
subclass
- text that will go in thesubclass
field of the AMRFinderPlus report for this gene -
protein_name
- text that will go in theSequence name
field of the AMRFinderPlus report for this gene
A header line starting with a '#' is also required. If you are only adding point mutations then a file with just the header line should be used. See the examples in the src directory.
Example:
#gene_symbol reportable type subtype class subclass protein_name
aac(6')-novel 2 AMR AMR AMINOGLYCOSIDE AMINOGLYCOSIDE Aminoglycoside modifying enzyme AAC(6')-novel
This is a tab-delimited file describing each of the point mutations you want to add to the AMRFinderPlus database. Each wildtype sequence must be in the FASTA file, and the accession / FASTA identifier must match this file. If you are only adding genes then this file may only contain the header line.
-
protein_identifier
- This is the first field on the defline for the wildtype gene in the FASTA file -
gene_symbol
- This is the symbol that will be printed in theGene symbol
field of the AMRFinderPlus report when this mutation is identified -
organism
- This is the --organism/-O option that the AMRFinderPlus user needs to use to screen for this mutation -
mutation
- This is point mutation you are searching for in the format <wildtype_allele><mutant_allele> E.g., V126E -
type
- text that will go in thetype
field of the AMRFinderPlus report when this point mutation is found -
subtype
- text that will go in thesubtype
field of the AMRFinderPlus report when this point mutation is found -
class
- text that will go in theclass
field of the AMRFinderPlus report for when this point mutation is found -
subclass
- text that will go in thesubclass
field of the AMRFinderPlus report when this point mutation is found -
mutated_protein_name
- text that will go in theSequence name
field of the AMRFinderPlus report when this point mutation is found
Note that the header is required, see the examples in the src directory.
Example:
#protein_identifier gene_symbol organism mutation class subclass mutated_protein_name
mexR_acc mexR Pseudomonas_aeruginosa V126E FLUOROQUINOLONE FLUOROQUINOLONE Pseudomonas aeruginosa multidrug resistance operon repressor MexR
- New in AMRFinderPlus
- Documentation for AMRFinder v1 (Depricated)
- Overview
- Install with bioconda (recommended)
- Docker Image
- Install with binary
- Compile from source
- Test your installation
- Usage (syntax/options)
- --organism option
- Examples
- Input file formats
- Output format
- Common errors
- Known issues
- Tips and tricks
- Database updates
- Software upgrades
- Genotypes vs. Phenotypes
- Scope: plus vs. core
- AMRFinderPlus "Method" column
- Element type and Subtype
- Class and Subclass