Skip to content

Latest commit

 

History

History
executable file
·
64 lines (39 loc) · 2.4 KB

dmdtask.md

File metadata and controls

executable file
·
64 lines (39 loc) · 2.4 KB

Descriptive Metadata Tasks

This is documentatio for the microservice which handles the files uploaded via "Load Metadata" (staff tool).

Processing stages

The tool has two stages:

  • Split uploaded metadata file into individual records
  • Upload approve records

When splitting, 4 different values need to be determined

  • An identifier, which should be a slug but may be missing a prefix (old records)
  • A label, which will be stored in a different location than the XML record
  • The XML record, which is then validated against the appropriate XML schema
  • A CMR JSON record, the results of the crosswalk, so staff can preview to ensure all is well

When storing:

  • Identifier used to know where to store. This needs to already exist (this tool does not create new identifiers)
  • XML is stored in location where crosswalks will later load it
  • Label is stored as a IIIF label, which can later be edited in "Edit in Access (Staff tool)"

Determining Identifiers and Labels

Identifers and Labels are determined based on the "Metadata File Type" set when uploading the file.

Dublin Core CSV

  • Mandatory identifier must be in an "objid" csv column
  • Optional "label" column. If this column doesn't exist, the first 'dc:title' (or 'dc.title', which is equivalent) value will be used

MARC - ID in 490

  • Label extracted from first 245$a, and set to "[unknown]" if missing.
  • Identifier extracted from 490
    • If $3 exists, join $3 and $v with an "_" separating
    • If $3 didn't exist, use $v

MARC - ID in oocihm interpretation

  • Label extracted from first 245$a, and set to "[unknown]" if missing.
  • Identifier extracted from 490
    • If $3 exists, join $3 and $v with an "_" separating
    • If $3 didn't exist, use $v, but convert any "-" to '_', and strip out anything that isn't a number or "_"

MARC - ID in ooe interpretation

  • Label extracted from first 245$a, and set to "[unknown]" if missing.
  • Identifier extracted from 035$a, skipping the first character.

MARC - ID in 856 URI

  • Label extracted from first 245$a, and set to "[unknown]" if missing.
  • Loop through all 856$u subfields, looking for ".canadiana.ca/view/{slug}" pattern
    • First one found, the {slug} is used as the identifier.
    • Warn if more matches were found, in case that was an "oops"

See also: DMDtask Source code (perl)