MusicBrainz rdb2rdf mappings for OBDA Semantika system. Currently we are presenting the native mapping language for Semantika, called TERMAL/XML. In the future the system will be able to read R2RML syntax.
The purpose of this project is to show Semantika's capability in exporting data from database tuples to RDF triples; as well as to show the possible option to reuse R2RML mappings for Semantika. I have included a sample dataset that you can use immediately. No import data required.
Credits for Barry Norton who has provided the R2RML mappings in his repository site LinkedBrainz / MusicBrainz-R2RML. This work is just a pure translation from those mappings.
New Tutorial: How to create a simple SPARQL endpoint for MusicBrainz database.
- To run the examples, you will need to use Semantika CLI project and download the latest release.
- Download it and extract the ZIP file into your local directory.
- Download the mapping files from termalxml/ and place them together with the CLI extraction. Notice that you might have downloaded the configuration file as well (mbzdb.cfg.xml).
- Download the sample dataset from sample-data/ and extract the ZIP into your local directory. The ZIP file contains a self-contained H2 database.
- Browse the H2 home folder and copy-and-paste a JAR file called to
in CLI home folder.
- Run the H2 server by executing
if you're in Windows) from its home directory.
$ cd $H2_HOME
$ ./
- Run Semantika CLI tool from its home directory.
$ cd $CLI_HOME
$ ./semantika materialize --config=mbzdb.cfg.xml --output=output.n3 -f N3
(Note the other options for the output format, e.g., Turtle, RDF/XML and JSON-LD, see semantika --help
All the mappings have been tested using the provided dataset and all passed the run. The total number of triples produced is 1,045,226 Triples.
Notice that some mappings are enclosed by comment signs. I have put a comment to explain the reason.
The dataset is a subset of MusicBrainz Database per March 1, 2014. The dataset contains all US artists/groups from 1970 to 1989 who are still active until now. In summary, the dataset consists of 54 tables with 481,401 tuples (~52 MB). The extraction model is defined as below:
where begin_date_year < 1990 and begin_date_year >= 1970
and area = 222
and (type = 1 or type = 2)
and ended = false
- The dataset is released following the parent license which is under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0.