-
Notifications
You must be signed in to change notification settings - Fork 41
Running AMRFinderPlus
See Test your installation for some basic examples of expected input and expected output.
amrfinder (-p <protein_fasta> | -n <nucleotide_fasta) [options]
amrfinder -u
The only required arguments are either -p <protein_fasta>
for proteins or -n <nucleotide_fasta>
for assembled nucleotide sequence. We also provide an automatic update
mechanism to update the database by using -u
. This will update to
the latest AMR database. See Software upgrades for information about updating the software. Use '--help
' to see the complete set of options and flags.
-
--protein <protein_fasta>
or-p <protein_fasta>
Protein FASTA file to search. -
--nucleotide <nucleotide_fasta>
or-n <nucleotide_fasta>
Assembled nucleotide FASTA file to search. When running in combined mode (ie. using-n
,-p
, and-g
options) this should be the genomic sequence, not the cut out coding sequence. -
--gff <gff_file>
or-g <gff_file>
GFF file to give sequence coordinates for proteins listed in-p <protein_fasta>
. Required for combined searches of protein and nucleotide. The value of the 'Name=' variable in the 9th field in the GFF must match the identifier in the protein FASTA file (everything between the '>' and the first whitespace character on the defline). The first column of the GFF must be the nucleotide sequence identifier in the nucleotide_fasta if provided (everything between the '>' and the first whitespace character on the defline). To interpret the output of external PGAP annotations use the--pgap
option. -
--organism <organism>
or-O <organism>
Taxon used for screening known resistance causing point mutations specific typing (Stx Type for _Escherichia) and blacklisting of common, non-informative genes.amrfinder -l
will list the possible values for this option. Note that rRNA mutations will not be screened if only a protein file is provided. To screen known Shigella mutations use Escherichia as the organism. See Organism option below for more details. -
--list_organisms
or-l
Print the list of all possible taxonomy groups used with the-O/--organism
option. -
--update
or-u
Download the latest version of the AMRFinderPlus database to the default location (location of the AMRFinderPlus binaries/data). Creates a directory underdata
in the formatYYYY-MM-DD.<version>
(e.g.,2019-03-06.1
). Will not overwrite an existing directory. Use--force_update
to overwrite the existing directory with a new download. -
--force_update
or-U
Download the latest version of the AMRFinderPlus database to the default location. Creates a directory underdata
in the formatYYYY-MM-DD.<version>
(e.g.,2019-03-06.1
), and overwrites an existing directory. -
--plus
Provide results from "Plus" genes such as virulence factors, stress-response genes, etc. See AMRFinderPlus database for details. -
--database_version
or-V
Print out complete version information of both the database and software. -
--print_node
Add an additional "Hierarchy node" column to the output with the node identifier used for naming this hit in the AMRFinderPlus reference gene hierarchy. See our Reference Gene Hierarchy help for more information.
-
--annotation_format <format>
or-a <format>
read non-standard format to determine GFF entry association with protein and nucleotide FASTA entries. This option is experimental at this time; please report issues to [email protected]. In addition to the default which works with GenBank and RefSeq files and is described in the section Input file formats the following options are available. See Input file formats for more details.-
genbank
- GenBank (default) -
bakta
- Bakta: rapid & standardized annotation of bacterial genomes, MAGs & plasmids -
microscope
- Microbial Genome Annotation & Analysis Platform -
patric
- Pathosystems Resource Integration Center / BV-BRC -
pgap
- NCBI Prokaryotic Genome Annotation Pipeline -
prokka
- Prokka rapid prokaryotic genome annotation -
pseudomonasdb
- The Pseudomonas Genome Database -
rast
- Rapid Annotation using Subsystem Technology
-
-
--blast_bin <directory>
Directory to search for 3rd party binaries (blast and hmmer) if not in the path. -
--database <database_dir>
or-d <database_dir>
Use an alternate database directory. This can be useful if you want to run an analysis with a database version that is not the latest. This should point to the directory containing the full AMRFinderPlus database files. It is possible to create your own custom databases, but it is not a trivial exercise. See AMRFinderPlus database for details on the format. -
--threads <#>
The number of threads to use for processing. AMRFinderPlus defaults to 4 on hosts with >= 4 cores. Setting this number higher than the number of cores on the running host may causeblastp
to fail. Using more than 4 threads may speed up searches with nucleotide sequences, but will have little effect if only protein is used as the input. -
--version
or-v
Print out just the software version. For more complete information we recommend you use the-V
command described above, but this is here to maintain backward compatibility. -
--mutation_all <point_mut_report>
Report genotypes at all locations screened for point mutations. This file allows you to distinguish between called point mutations that were the sensitive variant and the point mutations that could not be called because the sequence was not found. This file will contain all detected variants from the reference sequence, so it could be used as an initial screen for novel variants. Note "Gene symbols" for mutations not in the database (identifiable by [UNKNOWN] in the Sequence name field) have offsets that are relative to the start of the sequence indicated in the field "Accession of closest sequence" while "Gene symbols" from known point-mutation sites have gene symbols that match the Pathogen Detection Reference Gene Catalog standardized nomenclature for point mutations. -
--name <identifier>
Prepend a column containing an identifier for this run of AMRFinderPlus. For example this can be used to add a sample name column to the AMRFinderPlus results. -
--nucleotide_output <nucleotide.fasta>
Print a FASTA file of just the regions of the--nucleotide <input.fa>
that were identified by AMRFinderPlus. This will include the entire region that aligns to the references for point mutations. -
--nucleotide_flank5_output <nucleotide.fasta>
FASTA file of the regions of the--nucleotide <input.fa>
that were identified by AMRFinderPlus plus--nucleotide_flank5_size
additional nucleotides in the 5' direction, this will include the entire region that aligns to the references for point mutations plus the additional flank. -
--nucleotide_flank5_size <num_bases>
The number of additional bases in the 5' direction to add to the element sequence in the--nucleotide_flank5_output
direction. -
-o <output_file>
Print AMRFinderPlus output to a file instead of standard out. -
--protein_output <protein.fasta>
Print FASTA file of Proteins identified by AMRFinderPlus in the--protein <input.fa>
file. Only selects from the proteins provided on input, AMRFinderPlus will not do a translation of hits identified from nucleotide sequence. -
-q
suppress the printing of status messages to standard error while AMRFinderPlus is running. Error messages will still be printed. -
--report_all_equal
Report all equally scoring BLAST and HMM matches. This will report multiple lines for a single element if there are multiple reference proteins that have the same score. On those lines the fields Accession of closest sequence and Name of closest sequence will be different showing each of the database proteins that are equally close to the query sequence. -
--ident_min <0-1>
or-i <0-1>
Minimum identity for a blast-based hit hit (Methods BLAST or PARTIAL). -1 means use the curated threshold if it exists and 0.9 otherwise. Setting this value to something other than -1 will override curated similarity cutoffs. We only recommend using this option if you have a specific reason; don't make our curators sad by throwing away some of their hard work. -
--coverage_min <0-1>
or-c <0-1>
Minimum proportion of reference gene covered for a BLAST-based hit (Methods BLAST or PARTIAL). Default value is 0.5 -
--translation_table <1-33>
or-t <1-33>
Number from 1 to 33 to represent the translation table used for BLASTX. Default is 11. See Translation table description for a description of the available tables. -
--pgap
Alters the GFF and FASTA file handling to correctly interpret the output of the pgap pipeline. Note that you should use theannot.fna
file in the pgap output directory as the argument to the-n/--nucleotide
option. -
--gpipe_org
Use Pathogen Detection taxgroup names as arguments to the --organism option -
--parm <parameter string>
Pass additional parameters to amr_report. This is mostly used for development and debugging. -
--debug
Perform some additional integrity checks. May be useful for debugging, but not intended for public use.
The -O/--organism
option is used to get organism-specific results. For those
organisms which have been curated, using --organism
will get optimized
organism-specific results. AMRFinderPlus uses the --organism
for two
purposes:
- To screen for point mutations
- To filter out genes that are nearly universal in a group and uninformative
- To identify divergent Streptococcus pneumoniae and Neisseria gonorrhoeae pbp proteins that are usually penicillin resistant (
-O Streptococcus_pneumoniae
or-O Neisseria_gonorrhoeae
) - To run StxTyper to type Stx operons (
-O Escherichia
)
We currently curate a limited set of organisms for point mutations and/or
blacklisting of some plus genes that
are not likely to be informative in those species. Use amrfinder -l
to list
the organism options that can be used in the current database. Use the
Reference Gene
Catalog to identify
the point
mutations
and blacklisted
genes
that are affected by this option. A summary of what taxa have received specific attention from curators and where the AMRFinderPlus database has coverage for point mutations, virulence genes, and stress response genes is on the Curated organisms page.
For information on which organisms our curators have specifically worked on and believe we have good coverage in the AMRFinderPlus database for see Curated Organisms
Organism option | Point mutations | Blacklisted --plus genes |
Notes |
---|---|---|---|
Acinetobacter_baumannii | X | Use for the A. baumannii-calcoaceticus species complex | |
Burkholderia_cepacia | X | Use for the Burkholderia cepacia species complex | |
Burkholderia_mallei | X | ||
Burkholderia_pseudomallei | X | Use for the Burkholderia pseudomallei species complex | |
Campylobacter | X | Use for C. coli and C. jejuni | |
Citrobacter_freundii | X | ||
Corynebacterium_diphtheriae. | X | ||
Clostridioides_difficile | X | ||
Enterobacter_cloacae | X | ||
Enterobacter_asburiae | X | ||
Enterococcus_faecalis | X | ||
Enterococcus_faecium | X | Use for E. hirae | |
Escherichia | X | X | Use for Shigella and Escherichia |
Klebsiella_oxytoca | X | X | |
Klebsiella_pneumoniae | X | X | Use for K. pneumoniae species complex and K. aerogenes |
Neisseria_gonorrhoeae | X | ||
Neisseria_meningitidis | X | ||
Pseudomonas_aeruginosa | X | ||
Salmonella | X | X | |
Serratia_marcescens | X | ||
Staphylococcus_aureus | X | ||
Staphylococcus_pseudintermedius | X | X | |
Streptococcus_agalactiae | X | ||
Streptococcus_pneumoniae | X | Use for S. pneumoniae and S. mitis | |
Streptococcus_pyogenes | X | ||
Vibrio_cholerae | X | X | |
Vibrio_vulnificus | X | ||
Vibrio_parahaemolyticus | X |
Note that variant detection for Streptococcus_pneumoniae PBPs uses a new mechanism identifying divergent alleles. See Interpreting Results for more information.
AMRFinderPlus creates a fair number of temporary files in /tmp
by default. If the environment variable TMPDIR
is set AMRFinderPlus will instead put the temporary files in the directory pointed to by $TMPDIR
.
These examples use the test files test_prot.gff, test_prot.fa, and test_dna.fa if you want to try them for yourself.
# print a list of command-line options
amrfinder --help
# Download the latest AMRFinderPlus database
amrfinder -u
# Protein AMRFinder with no genomic coordinates
amrfinder -p test_prot.fa
# Translated nucleotide AMRFinder (will not use HMMs)
amrfinder -n test_dna.fa
# Protein AMRFinder using GFF to get genomic coordinates and 'plus' genes
amrfinder -p test_prot.fa -g test_prot.gff --plus
# Protein AMRFinder with Escherichia protein point mutations
amrfinder -p test_prot.fa -O Escherichia
# Full AMRFinderPlus search combining results
amrfinder -p test_prot.fa -g test_prot.gff -n test_dna.fa -O Escherichia --plus
AMRFinderPlus includes a couple of programs to assist with non-standard scenarios for database updates.
amrfinder_update
downloads and indexes the latest database version to a custom location.
amrfinder_index
will run the commands to generate the BLAST and HMMER databases from the distributed AMRFinderPlus database files. This indexing is automatic when running amrfinder -u
or amrfinder_update
.
There are three possible input files to AMRFinderPlus, the --protein
FASTA
file, the --nucleotide
FASTA file, and the --gff
file to tie them together.
Any of these files will be automatically decompressed with gunzip
if their
filename ends in .gz
; automatic handling of gzipped input files requires the
command gunzip
to be in your path.
Note that AMRFinderPlus does not support unicode (UTF-8), however as of version 3.11.14 it should ignore unicode characters that don't start contain bytes between 0x00 and 0x1F.
When run in the most sensitive and accurate "combined" mode AMRFinderPlus needs
a way to relate the results from the protein FASTA file and the nucleotide
FASTA file together and it uses the --gff
file to do that. Unfortunately
there isn't a standard for relating the entries of the FASTA files with the GFF
file. By default AMRFinderPlus reads the format of files downloaded from
GenBank/RefSeq. The --annotation_format
option added to version 3.10.38 adds
automated parsing of the output files of different annotation engines.
This option allows AMRFinderPlus to parse and associate entries in GFF files
with protein and nucleotide FASTA files in the data coming out of other
annotation systems and databases. This feature is a bit experimental as we do
not have extensive experience with many of these data sources, so please report
any issues to [email protected] or as GitHub
issues. The default behavior is described
under the -g <gff_file>
section below.
Parameters for the --annotation_format
/ -a
are as follows:
-
genbank
- Files downloaded from NCBI GenBank or RefSeq (default), rules are somewhat complicated by need to handle various NCBI-produced formats -
standard
- Default behavior follows the rules below to parse files downloaded from NCBI databases -
bakta
- Bakta: rapid & standardized annotation of bacterial genomes, MAGs & plasmids https://github.com/oschwengers/bakta -
microscope
- Microbial Genome Annotation & Analysis Platform https://mage.genoscope.cns.fr/microscope -
patric
- Pathosystems Resource Integration Center https://www.patricbrc.org / BV-BRC https://www.bv-brc.org -
pgap
- NCBI Prokaryotic Genome Annotation Pipeline https://www.ncbi.nlm.nih.gov/genome/annotation_prok -
prodigal
- Prodigal Gene Prediction Software https://github.com/hyattpd/Prodigal -
prokka
- Prokka rapid prokaryotic genome annotation https://github.com/tseemann/prokka -
pseudomonasdb
- The Pseudomonas Genome Database https://pseudomonas.com/ -
rast
- Rapid Annotation using Subsystem Technology http://RAST.nmpdr.org
GFF files are used to get sequence coordinates for AMRFinderPlus hits from protein
sequence and associate them with hits on nucleotide sequence. Use the --annotation_format
option described above for some common data sources.
The defaults will correctly parse input files downloaded from NCBI resources such as GenBank and RefSeq (--annotation_format genbank
). This parsing is very similar to the "Standard behavior" described below except that locus_tag=<protein accession>
and <project>:<accession>
formats are handled in deflines and GFF files.
Note that in GFF files some characters in identifiers need to be escaped using URL-style escapes:
- # (comment start)
- tab (%09)
- newline (%0A)
- carriage return (%0D)
- % percent (%25)
- control characters (%00 through %1F, %7F)
- ; semicolon (%3B)
- = equals (%3D)
- & ampersand (%26)
- , comma (%2C)
In addition quotes and tick-marks are trimmed, and for the default --annotation_format genbank
the ":" character should be avoided or escaped because it is used in some NCBI formats.
This is enabled by the --annotation_format standard
option and is very similar to --annotation_format genbank
.
The value of the 'Name=' variable in the 9th field in the GFF must
match the identifier in the protein FASTA file (everything between the '>'
and the first whitespace character on the defline). The first column of the GFF
must be the nucleotide sequence identifier in the nucleotide_fasta if provided
(everything between the '>' and the first whitespace character on the defline).
See
test_prot.gff,
test_prot.fa,
and
test_dna.fa
for a simple example. See Test your installation for how to run the
examples.
Simple example below (These were taken from test_prot.gff. See those files to see how the identifiers line up):
contig09 . gene 1 675 . - . Name=aph3pp-Ib_partial_5p_neg
contig09 . gene 715 1377 . - . Name=sul2_partial_3p_neg
contig11 . gene 113 547 . + . Name=blaTEM-internal_stop
Matching deflines from assembly:
>contig09 >case4_case6_sul2_aph3pp-Ib Providencia rettgeri strain Pret_2032, whole genome shotgun sequence 2160922-2162737 150-1527 (reverse comp'd)
>contig11 blaTEM divergent, with internal stop codon
Matching protein deflines:
>aph3pp-Ib_partial_5p_neg NZ_QKNQ01000001.1 Providencia rettgeri strain Pret_2032, whole genome shotgun sequence 2160922-2162737 150-1527 704-137
>sul2_partial_3p_neg NZ_QKNQ01000001.1 Providencia rettgeri strain Pret_2032, whole genome shotgun sequence 2160922-2162737 150-1377 2-667
>blaTEM-internal_stop
Some annotation pipelines will produce annotation files that AMRFinderPlus will have trouble reading. The --annotation_format
option described above will automatically handle most of them, but it it usually a simple matter to convert them to an appropriate format. If you are having trouble email us at [email protected] (or open a GitHub issue) with examples of the FASTA and GFF files you are trying to use and we should be able to help.
FASTA files are in a fairly standard format: Lines beginning with '>' are
considered deflines, and sequence identifiers are the first non-whitespace
characters on the defline. Sequence identifiers are what is reported AMRFinderPlus
output Example FASTA files:
test_prot.fa
and
test_dna.fa
.
The sequence identifiers must match the GFF file to use combined searches or
add genomic coordinates to protein searches (see above).
Because of some strange handling by BLAST the following additional requirements must be met for the FASTA sequence identifiers for the -n <nucleotide_fasta>
file:
'makeblastdb' truncates and/or alters sequence identifiers with the following characteristics. Now nucleotide FASTA identifiers (characters after '>' and before the first whitespace) with any of the following will cause amrfinder
to exit with an error message.
- FASTA identifier starts with '?'
- FASTA identifier contains the two character sequence ',,' or '\t' (the character '\' followed by the character 't')
- FASTA identifier ends with ';' '~' ',' or '.'
If you're having trouble with the input file formats see the
--annotation_format
option described
above or email us at
[email protected] (or open a github
issue) with examples of the FASTA and GFF
files you are trying to use and we should be able to help.
AMRFinder output is in tab-delimited format (.tsv). The output format depends
on the options -p
, -n
, and -g
. Protein searches with gff files (-p <file.fa> -g <file.gff>
and translated dna searches (-n <file.fa>
) will
include the Contig id
, start
, and stop
columns.
amrfinder -p
test_prot.fa
-g
test_prot.gff
-n
test_dna.fa
-O Campylobacter
Should
result in the sample output shown below and in test_both.expected
.
Protein identifier Contig id Start Stop Strand Gene symbol Sequence name Scope Element type Element subtype Class Subclass Method Target length Reference sequence length % Coverage of reference sequence % Identity to reference sequence Alignment length Accession of closest sequence Name of closest sequence HMM id HMM description
NA Contig5 260 2021 - 23S_A2075G Campylobacter macrolide resistant 23S core AMR POINT MACROLIDE MACROLIDE POINTN 2021 2912 60.51 99.83 1762 NC_022347.1:1040292-1037381 23S NA NA
blaTEM-156 contig1 101 961 + blaTEM-156 class A beta-lactamase TEM-156 core AMR AMR BETA-LACTAM BETA-LACTAM ALLELEP 286 286 100.00 100.00 286 WP_061158039.1 class A beta-lactamase TEM-156 NF000531.2 TEM family class A beta-lactamase
NA contig10 486 1307 + blaOXA class D beta-lactamase core AMR AMR BETA-LACTAM BETA-LACTAM INTERNAL_STOP 274 274 100.00 99.64 274 WP_000722315.1 oxacillin-hydrolyzing class D beta-lactamase OXA-9 NF000270.1 class D beta-lactamase
blaTEM-internal_stop contig11 113 547 + blaTEM TEM family class A beta-lactamase core AMR AMR BETA-LACTAM BETA-LACTAM PARTIALP 144 286 50.35 97.22 144 WP_000027057.1 class A broad-spectrum beta-lactamase TEM-1 NF000531.2 TEM family class A beta-lactamase
qacR-curated_blast contig12 71 637 + qacR multidrug-binding transcriptional regulator QacR plus STRESS BIOCIDE QUATERNARY AMMONIUM QUATERNARY AMMONIUM BLASTP 188 188 100.00 99.47 188 ADK23698.1 multidrug-binding transcriptional regulator QacR NA NA
blaPDC-114_blast contig2 1 1191 + blaPDC PDC family class C beta-lactamase core AMR AMR BETA-LACTAM CEPHALOSPORIN BLASTP 397 397 100.00 99.75 397 WP_061189306.1 class C beta-lactamase PDC-114 NF000422.6 PDC family class C beta-lactamase
blaOXA-436_partial contig3 101 802 + blaOXA OXA-48 family class D beta-lactamase core AMR AMR BETA-LACTAM BETA-LACTAM PARTIALP 233 265 87.92 100.00 233 WP_058842180.1 OXA-48 family carbapenem-hydrolyzing class D beta-lactamase OXA-436 NF000387.2 OXA-48 family class D beta-lactamase
vanG contig4 101 1147 + vanG D-alanine--D-serine ligase VanG core AMR AMR GLYCOPEPTIDE VANCOMYCIN EXACTP 349 349 100.00 100.00 349 WP_063856695.1 D-alanine--D-serine ligase VanG NF000091.3 D-alanine--D-serine ligase VanG
NA contig6 2680 3102 + 50S_L22_A103V Campylobacter macrolide resistant 50S L22 core AMR POINT MACROLIDE MACROLIDE POINTX 141 141 100.00 97.16 141 WP_002851214.1 50S L22 NA NA
gyrA contig6 31 2616 + gyrA_T86I Campylobacter quinolone resistant GyrA core AMR POINT QUINOLONE QUINOLONE POINTP 862 863 99.88 99.54 862 WP_002857904.1 gyrA NA NA
50S_L22 contig7 101 526 + 50S_L22_A103V Campylobacter macrolide resistant 50S L22 core AMR POINT MACROLIDE MACROLIDE POINTP 141 141 100.00 97.16 141 WP_002851214.1 50S L22 NA NA
NA contig8 101 700 + blaTEM TEM family class A beta-lactamase core AMR AMR BETA-LACTAM BETA-LACTAM PARTIAL_CONTIG_ENDX 200 286 69.93 100.00 200 WP_061158039.1 class A beta-lactamase TEM-156 NF000531.2 TEM family class A beta-lactamase
aph3pp-Ib_partial_5p_neg contig9 1 675 - aph(3'')-Ib aminoglycoside O-phosphotransferase APH(3'')-Ib core AMR AMR AMINOGLYCOSIDE STREPTOMYCIN PARTIAL_CONTIG_ENDP 225 275 81.82 99.56 225 WP_109545041.1 aminoglycoside O-phosphotransferase APH(3'')-Ib NF032895.1 aminoglycoside O-phosphotransferase APH(3'')-Ib
sul2_partial_3p_neg contig9 715 1377 - sul2 sulfonamide-resistant dihydropteroate synthase Sul2 core AMR AMR SULFONAMIDE SULFONAMIDE PARTIAL_CONTIG_ENDP 221 271 81.55 100.00 221 WP_001043265.1 sulfonamide-resistant dihydropteroate synthase Sul2 NF000295.1 sulfonamide-resistant dihydropteroate synthase Sul2
nimIJ_hmm contigX 1 501 + nimIJ NimIJ family nitroimidazole resistance protein core AMR AMR NITROIMIDAZOLE NITROIMIDAZOLE HMM 166 NA NA NA NA NA NA NF000262.1 NimIJ family nitroimidazole resistance protein
- Protein Identifier - This is from the FASTA defline for the protein or DNA sequence.
- Contig id - (optional) Contig name.
- Start - (optional) 1-based coordinate of first nucleotide coding for protein in DNA sequence on contig.
- Stop - (optional) 1-based coordinate of last nucleotide coding for protein in DNA sequence on contig. Note that for protein hits (where the Method is HMM or ends in P) the coordinates are taken from the GFF, which means that for circular contigs when the protein spans the contig break the stop coordinate may be larger than the contig size (see the GFF3 standard for details)
- Gene symbol - Gene or gene-family symbol for protein or nucleotide hit. For point mutations it is a combination of the gene symbol and the SNP definition separated by "_"
- Sequence name - Full-text name for the protein, RNA, or point mutation.
- Scope - The AMRFinderPlus database is split into 'core' AMR proteins that are expected to have an effect on resistance and 'plus' proteins of interest added with less stringent inclusion criteria. These may or may not be expected to have an effect on phenotype.
- Element type - AMRFinder+ genes are placed into functional categories based on predominant function AMR, STRESS, or VIRULENCE.
- Element subtype - Further elaboration of functional category into (ANTIGEN, BIOCIDE, HEAT, METAL, PORIN) if more specific category is available, otherwise he element is repeated.
- Class - For AMR genes this is the class of drugs that this gene is known to contribute to resistance of.
- Subclass - If more specificity about drugs within the drug class is known it is elaborated here.
-
Method - Type of hit found by AMRFinder. A suffix of 'P' or 'X' is appended to "Methods" that could be found by protein or nucleotide. See The Method column for more information.
- ALLELE - 100% sequence match over 100% of length to a protein named at the allele level in the AMRFinderPlus database.
- EXACT - 100% sequence match over 100% of length to a protein in the database that is not a named allele.
- BLAST - BLAST alignment is > 90% of length and > 90% identity to a protein in the AMRFinderPlus database.
- PARTIAL - BLAST alignment is > 50% of length, but < 90% of length and > 90% identity to the reference, and does not end at a contig boundary.
- PARTIAL_CONTIG_END - BLAST alignment is > 50% of length, but < 90% of length and > 90% identity to the reference, and the break occurrs at a contig boundary indicating that this gene is more likely to have been split by an assembly issue.
- HMM - HMM was hit above the cutoff, but there was not a BLAST hit that met standards for BLAST or PARTIAL. This does not have a suffix because only protein sequences are searched by HMM.
- INTERNAL_STOP - Translated blast reveals a stop codon that occurred before the end of the protein. This can only be assessed if the
-n <nucleotide_fasta>
option is used. - POINT - Point mutation identified by blast.
- Target length - The length of the query protein or gene. The length will be in amino-acids if the reference sequence is a protein, but nucleotide if the reference sequence is nucleotide.
- Reference sequence length - The length of the Reference protein or nucleotide in the database if a blast alignment was detected, otherwise NA.
- % Coverage of reference sequence - % of reference covered by blast hit if a blast alignment was detected, otherwise NA.
- % Identity to reference sequence - % amino-acid identity to reference protein or nucleotide identity for nucleotide reference if a blast alignment was detected, otherwise NA.
- Alignment length - Length of BLAST alignment in amino-acids or nucleotides if nucleotide reference if a blast alignment was detected, otherwise NA.
- Accession of closest protein - RefSeq accession for reference hit by BLAST if a blast alignment was detected otherwise NA. Note that only one reference will be chosen if the blast hit is equidistant from multiple references. For point mutations the reference is the sensitive "wild-type" allele, and the element symbol describes the specific mutation. Check the Reference Gene Catalog for more information on specific mutations or reference genes.
- Name of closest protein - Full name assigned to the closest reference hit if a blast alignment was detected, otherwise NA.
- HMM id - Accession for the HMM, NA if none.
- HMM description - The family name associated with the HMM, NA if none.
-
Hierarchy node (optional) - The node in the Reference Gene Hierarchy that this hit was assigned to for naming. Fusion genes and stx operons with multiple genes will have multiple values separated by '::'. This field only appears when the
--print_node
option is used.
To automatically combine overlapping results from protein and nucleotide searches the coordinates of the protein in the assembly contigs must be indicated by the GFF file. This requires a GFF file where the value of the 'Name=' variable of the 9th field in the GFF must match the identifier in the protein FASTA file (everything between the '>' and the first whitespace character on the defline). See the section on GFF file format for details of how AMRFinderPlus associates FASTA file entries with GFF file entries.
AMRFinderPlus does not have the concept of circular contigs, so genes that cross the break in circular contigs may be detected only as fragments. By default AMRFinderPlus has a length cutoff of 50% of the full gene length, so one side or the other should be detected at least as a partial_contig_end or blast hit. Depending on the annotation system proteins may be annotated crossing the contig boundary in circular contigs (NCBI's PGAP does this). These full-length proteins will be analyzed by AMRFinderPlus. Note that the stop coordinate in this case will extend past the contig boundary as described by the GFF specification.
GFF files are used to identify coordinates for protein sequences, but the association between the identifiers in the GFF and FASTA files is not part of the standard. See Input file formats for details of how AMRFinderPlus associates FASTA file entries with GFF file entries. If you have issues getting your GFF files to work please file an issue or email us at [email protected] including the files you are trying to use.
If you find bugs or have other questions/comments please email us at [email protected] or Create a GitHub issue.
- New in AMRFinderPlus
- Documentation for AMRFinder v1 (Depricated)
- Overview
- Install with bioconda (recommended)
- Docker Image
- Install with binary
- Compile from source
- Test your installation
- Usage (syntax/options)
- --organism option
- Examples
- Input file formats
- Output format
- Common errors
- Known issues
- Tips and tricks
- Database updates
- Software upgrades
- Genotypes vs. Phenotypes
- Scope: plus vs. core
- AMRFinderPlus "Method" column
- Element type and Subtype
- Class and Subclass