-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to map a single dataset containing multiple sources to itself? #255
Comments
Thanks to @MSherif I had partial success with |
We added the new |
Unfortunately it doesn't seem to work for me. Did I make a mistake with the combined metric? I don't really understand the documentation on what exactly MINUS, MAX and LESS_THAN output.
Despite saying that c1.cat should be less than c2.cat, the resulting catalogue-close.ttl still contains symmetric pairs:
|
|
Plz try |
Thank you for the detailed explanation, this is extremely helpful! Could you add this to the official documentation at http://dice-group.github.io/LIMES/#/user_manual/configuration_file/defining_link_specifications?id=boolean-operations? I know what minimum, maximum and set difference are but the interaction with the thresholds was not clear to me. However what I still don't know is: What is the similarity score output of the MINUS operator? The ones from the first parameter? And what if something is below the threshold? |
Unfortunately,
However this should not be possible, because for example http://hitontology.eu/ontology/EhrSfmSupportForHealthMaintenancePreventativeCareAndWellness only has one catalogue, and this cannot be smaller than itself, as specified in Output of LIMES
|
Actually, the |
Done updating the LIMES docs |
Is it possible to use LIMES with more than two sources which are all included in the same file?
The sources should be mapped to each other but of course I don't want to map a source to itself and I also don't want to have duplicate pairs (A,B) and (B,A).
To clarify with an example, lets say I have a class :Country with many instances and each country has a population of individuals.
All of this data is in the same file countries.ttl.
Now I want to find out, which individuals live in more than one country.
This can be done in the following manner, declaring source and target alike:
However this will generate a false match for every person to itself, and also it will also match each pair twice in both directions.
I would like to add a restriction like "STR(?x) < STR(?y)" but it seems like one cannot reference variables from the source in the restriction of the target.
A workaround is to throw away all matches with score exactly 1.0 but this is wasteful on resources and also discards correct matches that happen to be exactly equal.
Also, this will map people in a country to others in the same country which is not intended.
Another way is to perform postprocessing to remove all duplicate and self matches but that seems to be inefficient in both developer and execution time.
Lastly, I could write a script which would enumerate all n*(n-1)/2 unique non self-matching pairs and generate as many limes configuration files but that has its own problems.
Is there any way to solve this task efficiently using LIMES or do I need to use one of the mentioned imperfect options?
The text was updated successfully, but these errors were encountered: