Skip to content

nfdi4objects/lido-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Analysis and transformation of LIDO data

Collect, analyze, validate and transform LIDO from various sources

This repository contains resources for data integration of LIDO data sources as part of NFDI: eventually an ETL process from LIDO (in different application profiles) to a knowledge graph (with an RDF data model yet to be decided) and possibly the conversion from RDF to a JSON format.

Table of Contents

Installation

Install required tools:

  • metha OAI-PMH client
  • pigz for faster decompression
  • xsltproc, xmllint and xmlstarlet for XML processing
  • rapper from raptor-utils for RDF processing

On Ubuntu you can run sudo ./install.sh to install these dependencies.

Usage

Collect

LIDO records are either harvested via OAI-PMH or manually put in form of files.

Don't commit actual LIDO records to this repository, except for unit tests!

For instance it's possible to download allo LIDO records from kenom this way:

metha-sync -format lido https://www.kenom.de/oai/

This will take a long while, so better set a -from date and/or a maximum number of requests, e.g.

metha-sync -from 2023-06-01 -max 10 -format lido https://www.kenom.de/oai/

The records are stored in a cache directory that can be shown this way:

metha-sync -dir -format lido https://www.kenom.de/oai/

You can then copy the harvested records into a single XML file:

metha-cat -format lido https://www.kenom.de/oai/ > example.xml

Extract the LIDO records from their OAI-PMH envelope

xsltproc oaiextract.xsl example.xml > example.lido.xml

Alternatively list all files to process sequentially

find $(metha-sync -dir -format lido https://www.kenom.de/oai/) -name "*.gz" | xargs unpigz -c

Analyze

Statistics and inspection

Count XML pathes

xmlstarlet el example.lido.xml | sed s/^.*lido:lido\/// | sort |  uniq -c

Extract some XML elements

xmlstarlet sel -N lido=http://www.lido-schema.org -t -c "//lido:descriptiveMetadata/lido:objectClassificationWrap" example.lido.xml 

Validate

TODO (#2)

Transform

LIDO can be used as such but transformation to other formats and models makes sense for both data integration and analysis. Two basic forms of target structures exist:

  • Flat data for simplified reuse and indexing in as search index (probably JSON)
  • Graph data for knowlege graphs (probably RDF)

Convert to RDF

A minimal XSLT script to convert LIDO to RDF/XML is included

xsltproc lido2rdf.xsl example.lido.xml > example.rdf

Better use another RDF serialization, at least NTriples:

rapper -i rdfxml example.rdf > example.nt

Alternative: The conversion script to transform KENOM-LIDO to Numisma Data Model can be found at https://github.com/AmericanNumismaticSociety/migration_scripts/blob/master/kenom/process-oai-pmh.php (Apache License).

References

Related projects and applications

Related Publications

About

Analysis and transformation of LIDO data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published