Generates JSON-LD for various types of CSVs, it adopts the Vocabulary provided by w3c at CSVW to describe structure and information within. Also uses QUDT units ontology to lookup and describe units. Can segment complex csv files with multiple tables and annotation without further input. Has also an option to output complete serialized content of the csv in csvw standard output format through rdf api endpoint.
Situations in which the annotation will fail!
- If Numbers are used as column names
APP_PORT=<80>
ADMIN_MAIL=<email_of_admin>
SSL_VERIFY=<True or False> #default is True
Just pull the docker container from the github container registry
docker pull ghcr.io/mat-o-lab/csvtocsvw:latest
Clone the repo with
git clone https://github.com/Mat-O-Lab/CSVToCSVW
cd into the cloned folder
cd CSVToCSVW
Build and start the container.
docker-compose up
A simple UI can be found at at the index page '/' The API documentation at 'api/docs'
- Open the notebook in or any other jupyter instance.
- Run the first cell of the notebook. It will install the necesary python packages and definitions.
- Run the second cell
- Upload a csv file or paste in a url pointing at one in the provided widgets.
- Click the process button, it will try to determine encoding and column seperator automatically. If that fails, choose appropiate values from the drop downs in the widgets and press the process button again.
- If successful the json-ld created will be printed to the cell as output. Click the download button to download the code in the proper filename acoording to https://www.w3.org/ns/csvw.
- Place the file in the same folder then the csv it describes.
The authors would like to thank the Federal Government and the Heads of Government of the Länder for their funding and support within the framework of the Platform Material Digital consortium. Funded by the German Federal Ministry of Education and Research (BMBF) through the MaterialDigital Call in Project KupferDigital - project id 13XP5119.