Skip to content

Taxon IDs

Damiano Oldoni edited this page Oct 17, 2019 · 7 revisions

taxonID is a required Darwin Core term for checklists. It acts as the link between the taxon core and extensions (i.e. it is the foreign key in the extensions) and is defined as:

An identifier for the set of taxon information (data associated with the Taxon class). May be a global unique identifier or an identifier specific to the data set.

GBIF uses the taxonID to assess if a (re)published taxon is a new one or one they already have. We therefore strongly recommend to choose a taxonID that is as globally unique and stable as possible. If such a taxon identifier is not present in the source data or the identifiers used there can easily change over time (e.g. numbered rows), we recommend generating taxonIDs in the mapping script.

Generating taxonIDs

taxonIDs can be generated using the hash function digest::digest(). It will create a randomized code of fixed length from an input value. For a given input value, the code will always be the same and always be unique.

For taxonIDs specifically, you should use the scientific name and the kingdom as input values. That way, each scientific name gets a unique code and you don't create the same taxonID for hemihomonyms. We also advice to prepend the code with my_dataset_shortname:taxon: so the taxonID is more likely to be globally unique and its origin is more easily identifiable.

# digest() doesn't work with vectors (it just returns a single value), 
# but you can fix this by applying Vectorize()
vdigest <- Vectorize(digest)

# Generate taxonID as a combination of my_dataset_shortname + taxon + hash
input_data %<>% mutate(taxon_id = paste(
  "my_dataset_shortname", # e.g. "alien-fishes-checklist"
  "taxon",
  vdigest(paste(scientific_name, kingdom), algo = "md5"), # base hash on scientific_name and kingdom
  sep = ":"
))

For e.g. Hyalopsora polypodii (Dietel) Magnus in the kingdom Fungi in the dataset uredinales-belgium-checklist this will create the taxonID:

uredinales-belgium-checklist:taxon:260455576301d23f512a9650f9936ef9

Which will remain stable as long as the scientific name, the kingdom and the dataset shortname do not change. And if they do, it does make sense that the taxonID does as well.

Clone this wiki locally