Skip to content

Running AMRFinderPlus

Arjun Prasad edited this page Jan 7, 2025 · 134 revisions

See Test your installation for some basic examples of expected input and expected output.

Usage:

amrfinder (-p <protein_fasta> | -n <nucleotide_fasta) [options]
amrfinder -u

The only required arguments are either -p <protein_fasta> for proteins or -n <nucleotide_fasta> for assembled nucleotide sequence. We also provide an automatic update mechanism to update the database by using -u. This will update to the latest AMR database. See Software upgrades for information about updating the software. Use '--help' to see the complete set of options and flags.

Options:

Commonly used options:

  • --protein <protein_fasta> or -p <protein_fasta> Protein FASTA file to search.

  • --nucleotide <nucleotide_fasta> or -n <nucleotide_fasta> Assembled nucleotide FASTA file to search. When running in combined mode (ie. using -n, -p, and -g options) this should be the genomic sequence, not the cut out coding sequence.

  • --gff <gff_file> or -g <gff_file> GFF file to give sequence coordinates for proteins listed in -p <protein_fasta>. Required for combined searches of protein and nucleotide. The value of the 'Name=' variable in the 9th field in the GFF must match the identifier in the protein FASTA file (everything between the '>' and the first whitespace character on the defline). The first column of the GFF must be the nucleotide sequence identifier in the nucleotide_fasta if provided (everything between the '>' and the first whitespace character on the defline). To interpret the output of external PGAP annotations use the --pgap option.

  • --organism <organism> or -O <organism> Taxon used for screening known resistance causing point mutations specific typing (Stx Type for _Escherichia) and blacklisting of common, non-informative genes. amrfinder -l will list the possible values for this option. Note that rRNA mutations will not be screened if only a protein file is provided. To screen known Shigella mutations use Escherichia as the organism. See Organism option below for more details.

  • --list_organisms or -l Print the list of all possible taxonomy groups used with the -O/--organism option.

  • --update or -u Download the latest version of the AMRFinderPlus database to the default location (location of the AMRFinderPlus binaries/data). Creates a directory under data in the format YYYY-MM-DD.<version> (e.g., 2019-03-06.1). Will not overwrite an existing directory. Use --force_update to overwrite the existing directory with a new download.

  • --force_update or -U Download the latest version of the AMRFinderPlus database to the default location. Creates a directory under data in the format YYYY-MM-DD.<version> (e.g., 2019-03-06.1), and overwrites an existing directory.

  • --plus Provide results from "Plus" genes such as virulence factors, stress-response genes, etc. See AMRFinderPlus database for details.

  • --database_version or -V Print out complete version information of both the database and software. This information is printed to STDERR by default when running AMRFinderPlus on files (using -n or -p), and this option should appear alone if it is used.

  • --print_node Add an additional "Hierarchy node" column to the output with the node identifier used for naming this hit in the AMRFinderPlus reference gene hierarchy. See our Reference Gene Hierarchy help for more information.

Less frequently used options:

  • --annotation_format <format> or -a <format> read non-standard format to determine GFF entry association with protein and nucleotide FASTA entries. This option is experimental at this time; please report issues to [email protected]. In addition to the default which works with GenBank and RefSeq files and is described in the section Input file formats the following options are available. See Input file formats for more details.

    • genbank - GenBank (default)
    • bakta - Bakta: rapid & standardized annotation of bacterial genomes, MAGs & plasmids
    • microscope - Microbial Genome Annotation & Analysis Platform
    • patric - Pathosystems Resource Integration Center / BV-BRC
    • pgap - NCBI Prokaryotic Genome Annotation Pipeline
    • prokka - Prokka rapid prokaryotic genome annotation
    • pseudomonasdb - The Pseudomonas Genome Database
    • rast - Rapid Annotation using Subsystem Technology
  • --blast_bin <directory> Directory to search for 3rd party binaries (blast and hmmer) if not in the path.

  • --database <database_dir> or -d <database_dir> Use an alternate database directory. This can be useful if you want to run an analysis with a database version that is not the latest. This should point to the directory containing the full AMRFinderPlus database files. It is possible to create your own custom databases, but it is not a trivial exercise. See AMRFinderPlus database for details on the format.

  • --threads <#> The number of threads to use for processing. AMRFinderPlus defaults to 4 on hosts with >= 4 cores. Setting this number higher than the number of cores on the running host may cause blastp to fail. Using more than 4 threads may speed up searches with nucleotide sequences, but will have little effect if only protein is used as the input.

  • --version or -v Print out just the software version. For more complete information we recommend you use the -V command described above, but this is here to maintain backward compatibility.

  • --mutation_all <point_mut_report> Report genotypes at all locations screened for point mutations. This file allows you to distinguish between called point mutations that were the sensitive variant and the point mutations that could not be called because the sequence was not found. This file will contain all detected variants from the reference sequence, so it could be used as an initial screen for novel variants. Note "Gene symbols" for mutations not in the database (identifiable by [UNKNOWN] in the Sequence name field) have offsets that are relative to the start of the sequence indicated in the field "Accession of closest sequence" while "Gene symbols" from known point-mutation sites have gene symbols that match the Pathogen Detection Reference Gene Catalog standardized nomenclature for point mutations.

  • --name <identifier> Prepend a column containing an identifier for this run of AMRFinderPlus. For example this can be used to add a sample name column to the AMRFinderPlus results.

  • --nucleotide_output <nucleotide.fasta> Print a FASTA file of just the regions of the --nucleotide <input.fa> that were identified by AMRFinderPlus. This will include the entire region that aligns to the references for point mutations.

  • --nucleotide_flank5_output <nucleotide.fasta> FASTA file of the regions of the --nucleotide <input.fa> that were identified by AMRFinderPlus plus --nucleotide_flank5_size additional nucleotides in the 5' direction, this will include the entire region that aligns to the references for point mutations plus the additional flank.

  • --nucleotide_flank5_size <num_bases> The number of additional bases in the 5' direction to add to the element sequence in the --nucleotide_flank5_output direction.

  • -o <output_file> Print AMRFinderPlus output to a file instead of standard out.

  • --protein_output <protein.fasta> Print FASTA file of Proteins identified by AMRFinderPlus in the --protein <input.fa> file. Only selects from the proteins provided on input, AMRFinderPlus will not do a translation of hits identified from nucleotide sequence.

  • -q suppress the printing of status messages to standard error while AMRFinderPlus is running. Error messages will still be printed.

  • --report_all_equal Report all equally scoring BLAST and HMM matches. This will report multiple lines for a single element if there are multiple reference proteins that have the same score. On those lines the fields Accession of closest sequence and Name of closest sequence will be different showing each of the database proteins that are equally close to the query sequence.

  • --ident_min <0-1> or -i <0-1> Minimum identity for a blast-based hit hit (Methods BLAST or PARTIAL). -1 means use the curated threshold if it exists and 0.9 otherwise. Setting this value to something other than -1 will override curated similarity cutoffs. We only recommend using this option if you have a specific reason; don't make our curators sad by throwing away some of their hard work.

  • --coverage_min <0-1> or -c <0-1> Minimum proportion of reference gene covered for a BLAST-based hit (Methods BLAST or PARTIAL). Default value is 0.5

  • --translation_table <1-33> or -t <1-33> Number from 1 to 33 to represent the translation table used for BLASTX. Default is 11. See Translation table description for a description of the available tables.

  • --pgap Alters the GFF and FASTA file handling to correctly interpret the output of the pgap pipeline. Note that you should use the annot.fna file in the pgap output directory as the argument to the -n/--nucleotide option.

  • --gpipe_org Use Pathogen Detection taxgroup names as arguments to the --organism option

  • --parm <parameter string> Pass additional parameters to amr_report. This is mostly used for development and debugging.

  • --debug Perform some additional integrity checks. May be useful for debugging, but not intended for public use.

--organism option

The -O/--organism option is used to get organism-specific results. For those organisms which have been curated, using --organism will get optimized organism-specific results. AMRFinderPlus uses the --organism for two purposes:

  1. To screen for point mutations
  2. To filter out genes that are nearly universal in a group and uninformative
  3. To identify divergent Streptococcus pneumoniae and Neisseria gonorrhoeae pbp proteins that are usually penicillin resistant (-O Streptococcus_pneumoniae or -O Neisseria_gonorrhoeae)
  4. To run StxTyper to type Stx operons (-O Escherichia).

We currently curate a limited set of organisms for point mutations and/or blacklisting of some plus genes that are not likely to be informative in those species. Use amrfinder -l to list the organism options that can be used in the current database. Use the Reference Gene Catalog to identify the point mutations and blacklisted genes that are affected by this option. A summary of what taxa have received specific attention from curators and where the AMRFinderPlus database has coverage for point mutations, virulence genes, and stress response genes is on the Curated organisms page.

For information on which organisms our curators have specifically worked on and believe we have good coverage in the AMRFinderPlus database for see Curated Organisms

Organism option Point mutations Blacklisted --plus genes Notes
Acinetobacter_baumannii X Use for the A. baumannii-calcoaceticus species complex
Burkholderia_cepacia X Use for the Burkholderia cepacia species complex
Burkholderia_mallei X
Burkholderia_pseudomallei X Use for the Burkholderia pseudomallei species complex
Campylobacter X Use for C. coli and C. jejuni
Citrobacter_freundii X
Corynebacterium_diphtheriae. X
Clostridioides_difficile X
Enterobacter_cloacae X
Enterobacter_asburiae X
Enterococcus_faecalis X
Enterococcus_faecium X Use for E. hirae
Escherichia X X Use for Shigella and Escherichia
Haemophilus_influenzae X
Klebsiella_oxytoca X X
Klebsiella_pneumoniae X X Use for K. pneumoniae species complex and K. aerogenes
Neisseria_gonorrhoeae X
Neisseria_meningitidis X
Pseudomonas_aeruginosa X
Salmonella X X
Serratia_marcescens X
Staphylococcus_aureus X
Staphylococcus_pseudintermedius X X
Streptococcus_agalactiae X
Streptococcus_pneumoniae X Use for S. pneumoniae and S. mitis
Streptococcus_pyogenes X
Vibrio_cholerae X X
Vibrio_vulnificus X
Vibrio_parahaemolyticus X

Note that variant detection for Streptococcus_pneumoniae PBPs uses a new mechanism identifying divergent alleles. See Interpreting Results for more information.

Temporary files

AMRFinderPlus creates a fair number of temporary files in /tmp by default. If the environment variable TMPDIR is set AMRFinderPlus will instead put the temporary files in the directory pointed to by $TMPDIR.

Examples

These examples use the test files test_prot.gff, test_prot.fa, and test_dna.fa if you want to try them for yourself.

# print a list of command-line options
amrfinder --help

# Download the latest AMRFinderPlus database
amrfinder -u

# Protein AMRFinder with no genomic coordinates
amrfinder -p test_prot.fa

# Translated nucleotide AMRFinder (will not use HMMs)
amrfinder -n test_dna.fa

# Protein AMRFinder using GFF to get genomic coordinates and 'plus' genes
amrfinder -p test_prot.fa -g test_prot.gff --plus

# Protein AMRFinder with Escherichia protein point mutations
amrfinder -p test_prot.fa -O Escherichia

# Full AMRFinderPlus search combining results
amrfinder -p test_prot.fa -g test_prot.gff -n test_dna.fa -O Escherichia --plus

Accessory programs for non-standard database issues

AMRFinderPlus includes a couple of programs to assist with non-standard scenarios for database updates.

amrfinder_update downloads and indexes the latest database version to a custom location.

amrfinder_index will run the commands to generate the BLAST and HMMER databases from the distributed AMRFinderPlus database files. This indexing is automatic when running amrfinder -u or amrfinder_update.

Input file formats

There are three possible input files to AMRFinderPlus, the --protein FASTA file, the --nucleotide FASTA file, and the --gff file to tie them together. Any of these files will be automatically decompressed with gunzip if their filename ends in .gz; automatic handling of gzipped input files requires the command gunzip to be in your path.

Note that AMRFinderPlus does not support unicode (UTF-8), however as of version 3.11.14 it should ignore unicode characters that don't start contain bytes between 0x00 and 0x1F.

When run in the most sensitive and accurate "combined" mode AMRFinderPlus needs a way to relate the results from the protein FASTA file and the nucleotide FASTA file together and it uses the --gff file to do that. Unfortunately there isn't a standard for relating the entries of the FASTA files with the GFF file. By default AMRFinderPlus reads the format of files downloaded from GenBank/RefSeq. The --annotation_format option added to version 3.10.38 adds automated parsing of the output files of different annotation engines.

The --annotation_format <format> option

This option allows AMRFinderPlus to parse and associate entries in GFF files with protein and nucleotide FASTA files in the data coming out of other annotation systems and databases. This feature is a bit experimental as we do not have extensive experience with many of these data sources, so please report any issues to [email protected] or as GitHub issues. The default behavior is described under the -g <gff_file> section below.

Parameters for the --annotation_format / -a are as follows:

-g <gff_file>

GFF files are used to get sequence coordinates for AMRFinderPlus hits from protein sequence and associate them with hits on nucleotide sequence. Use the --annotation_format option described above for some common data sources.

The defaults will correctly parse input files downloaded from NCBI resources such as GenBank and RefSeq (--annotation_format genbank). This parsing is very similar to the "Standard behavior" described below except that locus_tag=<protein accession> and <project>:<accession> formats are handled in deflines and GFF files.

Note that in GFF files some characters in identifiers need to be escaped using URL-style escapes:

  • # (comment start)
  • tab (%09)
  • newline (%0A)
  • carriage return (%0D)
  • % percent (%25)
  • control characters (%00 through %1F, %7F)
  • ; semicolon (%3B)
  • = equals (%3D)
  • & ampersand (%26)
  • , comma (%2C)

In addition quotes and tick-marks are trimmed, and for the default --annotation_format genbank the ":" character should be avoided or escaped because it is used in some NCBI formats.

--annotation_format standard

This is enabled by the --annotation_format standard option and is very similar to --annotation_format genbank.
The value of the 'Name=' variable in the 9th field in the GFF must match the identifier in the protein FASTA file (everything between the '>' and the first whitespace character on the defline). The first column of the GFF must be the nucleotide sequence identifier in the nucleotide_fasta if provided (everything between the '>' and the first whitespace character on the defline). See test_prot.gff, test_prot.fa, and test_dna.fa for a simple example. See Test your installation for how to run the examples.

Simple example below (These were taken from test_prot.gff. See those files to see how the identifiers line up):

contig09	.	gene	1	675	.	-	.	Name=aph3pp-Ib_partial_5p_neg
contig09	.	gene	715	1377	.	-	.	Name=sul2_partial_3p_neg
contig11	.	gene	113	547	.	+	.	Name=blaTEM-internal_stop

Matching deflines from assembly:

>contig09 >case4_case6_sul2_aph3pp-Ib Providencia rettgeri strain Pret_2032, whole genome shotgun sequence  2160922-2162737  150-1527 (reverse comp'd)
>contig11 blaTEM divergent, with internal stop codon

Matching protein deflines:

>aph3pp-Ib_partial_5p_neg  NZ_QKNQ01000001.1 Providencia rettgeri strain Pret_2032, whole genome shotgun sequence  2160922-2162737  150-1527  704-137
>sul2_partial_3p_neg   NZ_QKNQ01000001.1 Providencia rettgeri strain Pret_2032, whole genome shotgun sequence  2160922-2162737  150-1377  2-667
>blaTEM-internal_stop

Some annotation pipelines will produce annotation files that AMRFinderPlus will have trouble reading. The --annotation_format option described above will automatically handle most of them, but it it usually a simple matter to convert them to an appropriate format. If you are having trouble email us at [email protected] (or open a GitHub issue) with examples of the FASTA and GFF files you are trying to use and we should be able to help.

-p <protein_fasta> and -n <nucleotide_fasta>

FASTA files are in a fairly standard format: Lines beginning with '>' are considered deflines, and sequence identifiers are the first non-whitespace characters on the defline. Sequence identifiers are what is reported AMRFinderPlus output Example FASTA files: test_prot.fa and test_dna.fa. The sequence identifiers must match the GFF file to use combined searches or add genomic coordinates to protein searches (see above).

Because of some strange handling by BLAST the following additional requirements must be met for the FASTA sequence identifiers for the -n <nucleotide_fasta> file:

'makeblastdb' truncates and/or alters sequence identifiers with the following characteristics. Now nucleotide FASTA identifiers (characters after '>' and before the first whitespace) with any of the following will cause amrfinder to exit with an error message.

  • FASTA identifier starts with '?'
  • FASTA identifier contains the two character sequence ',,' or '\t' (the character '\' followed by the character 't')
  • FASTA identifier ends with ';' '~' ',' or '.'

If you're having trouble with the input file formats see the --annotation_format option described above or email us at [email protected] (or open a github issue) with examples of the FASTA and GFF files you are trying to use and we should be able to help.

Output format

AMRFinder output is in tab-delimited format (.tsv). The output format depends on the options -p, -n, and -g. Protein searches with gff files (-p <file.fa> -g <file.gff> and translated dna searches (-n <file.fa>) will
include the Contig id, start, and stop columns.

Sample AMRFinderPlus report:

amrfinder -p test_prot.fa -g test_prot.gff -n test_dna.fa -O Campylobacter

Should result in the sample output shown below and in test_both.expected.

Protein id                 Contig id Start Stop Strand Element symbol Element name                                                                                                 Scope Type      Subtype   Class               Subclass                                                                        Method              Target length Reference sequence length % Coverage of reference % Identity to reference Alignment length Closest reference accession   Closest reference name                                              HMM accession HMM description                                     Hierarchy node
NA                         contig01      1  984      + blaTEMp_G162T  Escherichia amoxicillin-clavulanic acid/piperacillin-tazobactam/ticarcillin-clavulanic acid resistant blaTEM core  AMR       POINT     BETA-LACTAM         AMOXICILLIN-CLAVULANIC_ACID/PIPERACILLIN-TAZOBACTAM/TICARCILLIN-CLAVULANIC_ACID POINTN                        984                      1176                   83.67                   99.80              984 NZ_CP095603.1:148777-149952   blaTEM promoter region                                              NA            NA                                                  NA
blaTEM-156                 contig01    101  961      + blaTEM-156     class A beta-lactamase TEM-156                                                                               core  AMR       AMR       BETA-LACTAM         BETA-LACTAM                                                                     ALLELEP                       286                       286                  100.00                  100.00              286 WP_061158039.1                class A beta-lactamase TEM-156                                      NF000531.2    TEM family class A beta-lactamase                   blaTEM-156
blaPDC-114_blast           contig02      1 1191      + blaPDC         PDC family class C beta-lactamase                                                                            core  AMR       AMR       BETA-LACTAM         CEPHALOSPORIN                                                                   BLASTP                        397                       397                  100.00                   99.75              397 WP_061189306.1                class C beta-lactamase PDC-114                                      NF000422.6    PDC family class C beta-lactamase                   blaPDC
blaOXA-436_partial         contig03    101  802      + blaOXA         OXA-48 family class D beta-lactamase                                                                         core  AMR       AMR       BETA-LACTAM         BETA-LACTAM                                                                     PARTIALP                      233                       265                   87.92                  100.00              233 WP_058842180.1                OXA-48 family carbapenem-hydrolyzing class D beta-lactamase OXA-436 NF012161.0    class D beta-lactamase                              blaOXA-48_fam
vanG                       contig04    101 1147      + vanG           D-alanine--D-serine ligase VanG                                                                              core  AMR       AMR       GLYCOPEPTIDE        VANCOMYCIN                                                                      EXACTP                        349                       349                  100.00                  100.00              349 WP_063856695.1                D-alanine--D-serine ligase VanG                                     NF000091.3    D-alanine--D-serine ligase VanG                     vanG
NA                         contig04   1261 2391      + blaEC          BlaEC family class C beta-lactamase                                                                          plus  AMR       AMR       BETA-LACTAM         BETA-LACTAM                                                                     BLASTX                        377                       377                  100.00                   98.14              377 WP_063610930.1                extended-spectrum class C beta-lactamase EC-15                      NA            NA                                                  blaEC
NA                         contig08      1  700      + blaTEMp_G162T  Escherichia amoxicillin-clavulanic acid/piperacillin-tazobactam/ticarcillin-clavulanic acid resistant blaTEM core  AMR       POINT     BETA-LACTAM         AMOXICILLIN-CLAVULANIC_ACID/PIPERACILLIN-TAZOBACTAM/TICARCILLIN-CLAVULANIC_ACID POINTN                        700                      1176                   59.52                   99.71              700 NZ_CP095603.1:148777-149952   blaTEM promoter region                                              NA            NA                                                  NA
NA                         contig08    101  700      + blaTEM         TEM family class A beta-lactamase                                                                            core  AMR       AMR       BETA-LACTAM         BETA-LACTAM                                                                     PARTIAL_CONTIG_ENDX           200                       286                   69.93                  100.00              200 WP_061158039.1                class A beta-lactamase TEM-156                                      NA            NA                                                  blaTEM
aph3pp-Ib_partial_5p_neg   contig09      1  675      - aph(3'')-Ib    aminoglycoside O-phosphotransferase APH(3'')-Ib                                                              core  AMR       AMR       AMINOGLYCOSIDE      STREPTOMYCIN                                                                    PARTIAL_CONTIG_ENDP           225                       267                   81.27                  100.00              217 WP_001082319.1                aminoglycoside O-phosphotransferase APH(3'')-Ib                     NF032896.1    APH(3'') family aminoglycoside O-phosphotransferase aph(3'')-Ib
sul2_partial_3p_neg        contig09    715 1377      - sul2           sulfonamide-resistant dihydropteroate synthase Sul2                                                          core  AMR       AMR       SULFONAMIDE         SULFONAMIDE                                                                     PARTIAL_CONTIG_ENDP           221                       271                   81.55                  100.00              221 WP_001043265.1                sulfonamide-resistant dihydropteroate synthase Sul2                 NA            NA                                                  sul2
NA                         contig10    486 1307      + blaOXA         OXA-9 family oxacillin-hydrolyzing class D beta-lactamase                                                    core  AMR       AMR       BETA-LACTAM         BETA-LACTAM                                                                     INTERNAL_STOP                 274                       274                  100.00                   99.64              274 WP_000722315.1                oxacillin-hydrolyzing class D beta-lactamase OXA-9                  NA            NA                                                  blaOXA-9_fam
NA                         contig11      1  984      + blaTEMp_G162T  Escherichia amoxicillin-clavulanic acid/piperacillin-tazobactam/ticarcillin-clavulanic acid resistant blaTEM core  AMR       POINT     BETA-LACTAM         AMOXICILLIN-CLAVULANIC_ACID/PIPERACILLIN-TAZOBACTAM/TICARCILLIN-CLAVULANIC_ACID POINTN                        984                      1176                   83.67                   96.04              984 NZ_CP095603.1:148777-149952   blaTEM promoter region                                              NA            NA                                                  NA
blaTEM-internal_stop       contig11    113  547      + blaTEM         TEM family class A beta-lactamase                                                                            core  AMR       AMR       BETA-LACTAM         BETA-LACTAM                                                                     INTERNAL_STOP                 144                       286                   50.35                   97.22              144 WP_000027057.1                broad-spectrum class A beta-lactamase TEM-1                         NA            NA                                                  blaTEM
qacR-curated_blast         contig12     71  637      + qacR           multidrug-binding transcriptional regulator QacR                                                             plus  STRESS    BIOCIDE   QUATERNARY AMMONIUM QUATERNARY AMMONIUM                                                             BLASTP                        188                       188                  100.00                   99.47              188 ADK23698.1                    multidrug-binding transcriptional regulator QacR                    NA            NA                                                  qacR
emrD3-suppressed-in-vibrio contig13      1 1137      + emrD3          multidrug efflux MFS transporter EmrD-3                                                                      plus  AMR       AMR       EFFLUX              EFFLUX                                                                          EXACTP                        379                       379                  100.00                  100.00              379 ABQ18953.1                    multidrug efflux MFS transporter EmrD-3                             NA            NA                                                  emrD3
NA                         contig14      1 1089      + pmrB_C84R      Escherichia colistin resistant PmrB                                                                          core  AMR       POINT     COLISTIN            COLISTIN                                                                        POINTX                        363                       363                  100.00                   99.72              363 WP_001300761.1                two-component system sensor histidine kinase PmrB                   NA            NA                                                  pmrB
pmrB_C84R                  contig14   1093 2181      + pmrB_C84R      Escherichia colistin resistant PmrB                                                                          core  AMR       POINT     COLISTIN            COLISTIN                                                                        POINTP                        363                       363                  100.00                   99.72              363 WP_001300761.1                two-component system sensor histidine kinase PmrB                   NA            NA                                                  pmrB
NA                         contig15      1 2905      + 23S_A2058T     Escherichia azithromycin/erythromycin/telithromycin resistant 23S                                            core  AMR       POINT     MACROLIDE           AZITHROMYCIN/ERYTHROMYCIN/TELITHROMYCIN                                         POINTN                       2905                      2905                  100.00                   99.97             2905 NC_004431.1:237160-240064     23S ribosomal RNA                                                   NA            NA                                                  NA
NA                         contig16      1  720      + nfsA_K141STOP  Escherichia nitrofurantoin resistant NfsA                                                                    core  AMR       POINT     NITROFURAN          NITROFURANTOIN                                                                  POINTX                        240                       240                  100.00                   99.17              240 WP_089631889.1                nitroreductase NfsA                                                 NA            NA                                                  nfsA
NA                         contig16      1  720      + nfsA_R15C      Escherichia nitrofurantoin resistant NfsA                                                                    core  AMR       POINT     NITROFURAN          NITROFURANTOIN                                                                  POINTX                        240                       240                  100.00                   99.17              240 WP_089631889.1                nitroreductase NfsA                                                 NA            NA                                                  nfsA
NA                         contig17      1  247      + ampC_T-14TGT   Escherichia cephalosporin resistant ampC                                                                     core  AMR       POINT     BETA-LACTAM         CEPHALOSPORIN                                                                   POINTN                        247                       245                  100.00                   99.19              247 NZ_CP041538.1:1149245-1149489 ampC/blaEC promoter region                                          NA            NA                                                  NA
stxA2a_prot                contig18    279 1238      + stxA2          Shiga toxin Stx2 subunit A                                                                                   plus  VIRULENCE VIRULENCE STX2                stxA2                                                                           EXACTP                        319                       319                  100.00                  100.00              319 TJA36680.1                    Shiga toxin Stx2 subunit A                                          NF041702.1    Shiga toxin Stx2 subunit A                          stxA2_acd
NA                         contig18    279 1519      + stx2a_operon   stx2a operon                                                                                                 plus  VIRULENCE STX_TYPE  STX2                STX2A                                                                           COMPLETE                     1241                                                                    100.00              410 AAS07600.1, AAM90978.1        Shiga toxin stx2a                                                   NA            NA                                                  stxA2a::stxB2a
stxB2a_prot                contig18   1250 1519      + stxB2          Shiga toxin Stx2a subunit B                                                                                  plus  VIRULENCE VIRULENCE STX2                stxB2a                                                                          EXACTP                         89                        89                  100.00                  100.00               89 AAM90978.1                    Shiga toxin Stx2a subunit B                                         NF033660.0    Shiga toxin Stx2 subunit B                          stxB2a
nimIJ_hmm                  contigX       1  501      + nimIJ          NimIJ family 5-nitroimidazole reductase                                                                      core  AMR       AMR       NITROIMIDAZOLE      NITROIMIDAZOLE                                                                  HMM                           166                       165                   98.18                   76.54              162 WP_005812825.1                NimIJ family 5-nitroimidazole reductase                             NF000262.1    NimIJ family 5-nitroimidazole reductase             nimIJ

Fields:

  • __Protein id - This is from the FASTA defline for the protein or DNA sequence.
  • Contig id - (optional) Contig name.
  • Start - (optional) 1-based coordinate of first nucleotide coding for protein in DNA sequence on contig.
  • Stop - (optional) 1-based coordinate of last nucleotide coding for protein in DNA sequence on contig. Note that for protein hits (where the Method is HMM or ends in P) the coordinates are taken from the GFF, which means that for circular contigs when the protein spans the contig break the stop coordinate may be larger than the contig size (see the GFF3 standard for details)
  • Strand - The orientation of the sequence identified '+' or '-' strand is indicated relative to the query sequence.
  • Element symbol - Gene or gene-family symbol for protein or nucleotide hit. For point mutations it is a combination of the gene symbol and the SNP definition separated by "_", for stx operons it is the operon type/subtype followed by "_operon"
  • Element name - Full-text name for the protein, RNA, or point mutation.
  • Scope - The AMRFinderPlus database is split into 'core' AMR proteins that are expected to have an effect on resistance and 'plus' proteins of interest added with less stringent inclusion criteria. These may or may not be expected to have an effect on phenotype.
  • Type - AMRFinder+ genes are placed into functional categories based on predominant function AMR, STRESS, or VIRULENCE.
  • Subtype - Further elaboration of functional category into (ANTIGEN, BIOCIDE, HEAT, METAL, PORIN, STX_TYPE) if more specific category is available, otherwise he element is repeated.
  • Class - For AMR genes this is the class of drugs that this gene is known to contribute to resistance of.
  • Subclass - If more specificity about drugs within the drug class is known it is elaborated here.
  • Method - Type of hit found by AMRFinder. A suffix of 'P' or 'X' is appended to "Methods" that could be found by protein or nucleotide. See The Method column for more information on standard AMRFinderPlus "Methods" and how they are determined. For StxTyper output there are a slightly different set of "Methods" because of differences in operon typing. See the StxTyper output documentation for details.
    • ALLELE - 100% sequence match over 100% of length to a protein named at the allele level in the AMRFinderPlus database.
    • EXACT - 100% sequence match over 100% of length to a protein in the database that is not a named allele.
    • BLAST - BLAST alignment is > 90% of length and > 90% identity to a protein in the AMRFinderPlus database.
    • PARTIAL - BLAST alignment is > 50% of length, but < 90% of length and > 90% identity to the reference, and does not end at a contig boundary.
    • PARTIAL_CONTIG_END - BLAST alignment is > 50% of length, but < 90% of length and > 90% identity to the reference, and the break occurrs at a contig boundary indicating that this gene is more likely to have been split by an assembly issue.
    • HMM - HMM was hit above the cutoff, but there was not a BLAST hit that met standards for BLAST or PARTIAL. This does not have a suffix because only protein sequences are searched by HMM.
    • INTERNAL_STOP - Translated blast reveals a stop codon that occurred before the end of the protein. This can only be assessed if the -n <nucleotide_fasta> option is used.
    • POINT - Point mutation identified by blast.
  • Target length - The length of the query protein or gene. The length will be in amino-acids if the reference sequence is a protein, but nucleotide if the reference sequence is nucleotide.
  • Reference sequence length - The length of the Reference protein or nucleotide in the database if a blast alignment was detected, otherwise NA. Stx operons have this field as blank or NA because the references used are protein sequences for the two subunits.
  • % Coverage of reference - % of reference covered by blast hit if a blast alignment was detected, otherwise NA. Stx operons have this field as blank or NA to avoid confusion about the way % identities are calculated for Stx Typing. See the StxTyper documentation for details.
  • % Identity to reference - % amino-acid identity to reference protein or nucleotide identity for nucleotide reference if a blast alignment was detected, otherwise NA. For Stx operons this is the combined amino-acid identity of the two subunits.
  • Alignment length - Length of BLAST alignment in amino-acids or nucleotides if nucleotide reference if a blast alignment was detected, otherwise NA.
  • Closest reference accession - RefSeq accession for reference hit by BLAST if a blast alignment was detected otherwise NA. Note that only one reference will be chosen if the blast hit is equidistant from multiple references. For point mutations the reference is the sensitive "wild-type" allele, and the element symbol describes the specific mutation. Check the Reference Gene Catalog for more information on specific mutations or reference genes. For Stx operons this is comma-separated accessions of the reference proteins for the stxA and stxB subunits (if both are present).
  • Closest reference name - Full name assigned to the closest reference hit if a blast alignment was detected, otherwise NA.
  • HMM accession - Accession for the HMM, NA if none.
  • HMM description - The family name associated with the HMM, NA if none.
  • Hierarchy node (optional) - The node in the Reference Gene Hierarchy that this hit was assigned to for naming. Fusion genes and stx operons with multiple hierarchy types will have multiple values separated by '::'. This field only appears when the --print_node option is used.

Common errors and what they mean

Protein id "<protein id>" is not in the .gff-file

To automatically combine overlapping results from protein and nucleotide searches the coordinates of the protein in the assembly contigs must be indicated by the GFF file. This requires a GFF file where the value of the 'Name=' variable of the 9th field in the GFF must match the identifier in the protein FASTA file (everything between the '>' and the first whitespace character on the defline). See the section on GFF file format for details of how AMRFinderPlus associates FASTA file entries with GFF file entries.

Known Issues

Circular contigs

AMRFinderPlus does not have the concept of circular contigs, so genes that cross the break in circular contigs may be detected only as fragments. By default AMRFinderPlus has a length cutoff of 50% of the full gene length, so one side or the other should be detected at least as a partial_contig_end or blast hit. Depending on the annotation system proteins may be annotated crossing the contig boundary in circular contigs (NCBI's PGAP does this). These full-length proteins will be analyzed by AMRFinderPlus. Note that the stop coordinate in this case will extend past the contig boundary as described by the GFF specification.

GFF file formats are not all standard

GFF files are used to identify coordinates for protein sequences, but the association between the identifiers in the GFF and FASTA files is not part of the standard. See Input file formats for details of how AMRFinderPlus associates FASTA file entries with GFF file entries. If you have issues getting your GFF files to work please file an issue or email us at [email protected] including the files you are trying to use.

If you find bugs or have other questions/comments please email us at [email protected] or Create a GitHub issue.

Clone this wiki locally