-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uberon relation reductions in our pipeline #227
Comments
Actually, my idea mentioned above seems incorrect, our pipeline code does exactly the check I thought it was not doing. In class InsertUberon, lines 734 to 781. I need to log the edges produced from uterus to investigate further. |
The problem comes from insertion of Uberon in our database. Some direct relations can be seen as indirect, because we sometimes have classes with such relations:
(no idea why relation 1 is needed, but this creates an indirect relation going from UBERON_0010011 to UBERON_0010009 in Primata) Or:
(the latter is to say that, in Danio, all UBERON_0011362 are UBERON_0003496, and it is cause of cycles (see obophenotype/uberon#651), but it's managed in another piece of code; it creates an indirect relation going from UBERON_0001532 to UBERON_0003496 in Danio). In these cases, we would get the correct direct relation as well (e.g., UBERON_0001532 SubClassOf UBERON_0003496), with a corresponding direct outgoing edge, but indirect relations have priority in the method |
Now that I think about it:
=> for consistent call propagation between Bgee and topGO, we need to be sure that all indirect relations can be retrieved through a chain of direct relations. Maybe, instead of using |
We noticed the following issue: from the term UBERON:0001295 endometrium, in platypus we can reach the following ancestors through indirect relations:
But we do not manage to reach the following structures by following the chain of direct relations stored in the database for platypus:
They are all ancestors of uterus, which does not exist in platypus. Hmm... See the chain of direct relations in the database for platypus:
The direct/indirect relations are retrieved in our pipeline, from our custom version of uberon in generated_files/uberon/custom_composite.obo, in the method
org.bgee.pipeline.uberon.InsertUberon.generateRelationTOsFirstPass(Map, Map, Uberon, Set, Collection)
of bgee_pipeline of BgeeDB/bgee_apps. This code usesOWLGraphWrapper
. First, it retrieves all relations, with chains of object properties packed if possible, with the methodOWLGraphWrapper.getOutgoingEdgesNamedClosureOverSupPropsWithGCI(OWLClass)
. Then it also retrieves direct relations with the methodOWLGraphWrapper.getOutgoingEdgesWithGCI(OWLClass)
, that's how the distinction between direct and indirect relations is done.=> it means that
OWLGraphWrapper
has inferred by relation reduction a set of indirect relations that we do not retrieve just by following the direct relations we have stored in the Bgee database. Need to investigate how the relations returned bygetOutgoingEdgesNamedClosureOverSupPropsWithGCI
are produced. Is it a bug, or is it all good?IDEA: maybe the GCI relations do not consider taxon constraints on OWLClasses. We take them into account for insertion into the database. Maybe
getOutgoingEdgesNamedClosureOverSupPropsWithGCI
andgetOutgoingEdgesWithGCI
have indeed retrieved relations between endometrium and uterus in platypus, but we have removed them at time of insertion in the database because uterus does not exist in platypus. But then, we still have kept the relations that had been inferred thanks to the relations incoming/outgoing from uterus.The fix would be to be able to discard the relations returned by
getOutgoingEdgesNamedClosureOverSupPropsWithGCI
if they go through an OWLClass that does not exist in the requested speciesAnd if it is really the source of the problem, after the fix we can add a check after insertion in the database, walking the path of direct relations to check whether we retrieve exactly the same terms reached by indirect relations.
But, if it is not a bug and these ancestors are really reachable through relation reduction, we need to think about how to provide relations to topGO: we provide to it only the direct relations, so that it is not capable of reaching the terms we reach in Bgee through inferred indirect relations.
The text was updated successfully, but these errors were encountered: