This is a benchmark dataset for ontology matching created by Felix Kraus (KIT). It is based on an archaeology test case of the DH benchmark dataset, with focus on evaluating matcher performance when dealing with different languages. You can find all the general information about the dataset that you need for using it at the OAEI in this repository.
For further information on the OAEI 2024 see also here.
This benchmark dataset facilitaes the development of ontology matching systems for datasets in different languages in the Digital Humanities on the example of archaeology. The systems face special obstacles which are at least partly addressed in this dataset:
- ontologies using only a single language that is not English
- domain-specific terms with a small research community at times
- use of a data model suitable for easily creating knowledge organization systems
The dataset includes several test cases all based on the archaeology test case "idai-pactols" from the DH benchmark dataset. Each test case consists of a monolingual source ontology, a monolingual target ontology and a manually compiled reference alignment. Only equivalent relations ("=") are targeted.
There were two different vocabularies used for the ten different test cases:
- PACTOLS [1]
- Adapted version: Only narrower terms and direct ancestors of the concept "archaeological site" were used
- About 70 terms
- Languages: Arabic, Dutch, English, French, German, Italian, Spanish
- iDAI.world [2]
- Adapted version: Only narrower terms and direct ancestors of the concept "material things" were used
- About 2600 terms
- Major languages: Arabic, English, French, German, Italian
For all test cases, the source is iDAI.world and the target is PACTOLS. There exist the following combinations of monolingual ontologies:
- de-de
- de-en
- de-fr
- de-it
- en-en
- en-fr
- en-it
- fr-fr
- fr-it
- it-it
There are four common langauges of the two ontologies: English, French, German, Italian. To create the different test cases, all languages but one was removed from the ontology. This was done using a script developed for this purpose by the track authors. This leads to 10 different combinations of monolingual ontologies.
The reference was also taken from the DH benchmark dataset. Terms in the reference that do not exist in the language of the monolingual ontology are removed accordingly from the reference.
See the corresponding section of the DH benchmark dataset.
[1]: PACTOLS (adapted) [ODbL v1.0; Creators: Groupe PACTOLS/FRANTIQ]
[2]: iDAI.world (adapted) [CC BY 4.0; Creators: Annika Kirscheneder, Camilla Colombi, Elenore Pape, Gabriele Rasbach, Henriette Senst, Lena Vitt, Matthias Block, Nina Dworschak, Reinhard Förtsch, Sabine Thänert]
- Creator: Felix Kraus
- Email (substitute accordingly): firstname.lastname (at) kit (dot) edu
- License owner: Karlsruhe Institute of Technology (KIT)
Development of this software product was funded by the research program “Engineering Digital Futures” of the Helmholtz Association of German Research Centers.