-
Notifications
You must be signed in to change notification settings - Fork 12
Taxon IDs
taxonID
is a required Darwin Core term for checklists. It acts as the link between the taxon core and extensions (i.e. it is the foreign key in the extensions) and is defined as:
An identifier for the set of taxon information (data associated with the Taxon class). May be a global unique identifier or an identifier specific to the data set.
GBIF uses the taxonID
to assess if a (re)published taxon is a new one or one they already have. We therefore strongly recommend to choose a taxonID
that is as globally unique and stable as possible. If such a taxon identifier is not present in the source data or the identifiers used there can easily change over time (e.g. numbered rows), we recommend generating taxonIDs in the mapping script.
taxonID
s can be generated using the hash function digest::digest()
. It will create a randomized code of fixed length from an input value. For a given input value, the code will always be the same and always be unique.
For taxonID
s specifically, you should use the scientific name and the kingdom as input values. That way, each scientific name gets a unique code and you don't create the same taxonID
for hemihomonyms. We also advice to prepend the code with my_dataset_shortname:taxon:
so the taxonID
is more likely to be globally unique and its origin is more easily identifiable.
# digest() doesn't work with vectors (it just returns a single value),
# but you can fix this by applying Vectorize()
vdigest <- Vectorize(digest)
# Generate taxonID as a combination of my_dataset_shortname + taxon + hash
input_data %<>% mutate(taxon_id = paste(
"my_dataset_shortname", # e.g. "alien-fishes-checklist"
"taxon",
vdigest(paste(scientific_name, kingdom), algo = "md5"), # base hash on scientific_name and kingdom
sep = ":"
))
For e.g. Hyalopsora polypodii (Dietel) Magnus
in the kingdom Fungi
in the dataset uredinales-belgium-checklist
this will create the taxonID
:
uredinales-belgium-checklist:taxon:260455576301d23f512a9650f9936ef9
Which will remain stable as long as the scientific name, the kingdom and the dataset shortname do not change. And if they do, it does make sense that the taxonID
does as well.
- Home
- Getting started
- Basics
- Ingredients: Source data
- Instructions: R Markdown
- Utensils: Tidyverse functions
- Dinner: Darwin Core data
- Mapping script
- Data preparation
- Mapping
- GitHub
- Publishing data
- Examples