-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
All environmental samples have an exact synomym "environmental samples" #56
Comments
Issue arises from another project |
That repo must be private, because I get a 404.
|
That's a repo we use for some internal text mining projects, where we may have corpi that people don't want shared yet. There isn't much context there, other than any time a text says "environmental sample" it false positive matches hundreds of "taxa" because that is the exact synonym that was assigned. This would mark the first time we don't have a 100% isomorphic translation from the source, so it brings up various questions which are raised here: In this case I would make a case an exception is justified
but we need a process for making these kinds of decisions for this ontology, I will send an email to obo-taxonomy and gather other feedback |
Agree that 'environmental samples' should be an exception. That's not even a synonym for any actual taxon. If deleting it altogether is too much, then perhaps the information could be captured in a comment (or other mechanism). Another alternative is to change it from exact to related. I come across the same issue (for actual synonyms) in my automated processing of UniProtKB for PRO. For these, I examine the whole of what's going to be imported, detect synonyms that duplicate either other synonyms or other labels, and mark them accordingly (can use related or broad; not sure which works best for the problem being resolved by the change in algorithm). |
I'd prefer that they fix this upstream. If that's not in the cards, then I'm fine with us excluding this particular synonym in the OWL we generate. |
|
I support demoting this to either an annotation or better yet a comment. @bpeters42 has a point about synonyms that aren't exact, but that's a different issue - human is a sloppy synonym for H. sapiens, but it is still a synonym. "Environmental sample" isn't a synonym or even really about the taxon as a whole. It's something else, perhaps metadata about individual(s) in the taxon or collection events. |
@pmidford , I actually have zero problems with 'human' being an exact synonym of 'homo sapiens'; my problem is that, at the same time, the parent taxon 'homo' has the exact synonym 'humans'. Which conflates singular vs. plural with a class hierarchy, and leads to craziness (Homo heidelbergensis being a kind of humans, but not a kind of human?) Completely agree with you that 'environmental sample' is even worse. I was trying to point out that the 'exact synonyms' are not only problematic for 'environmental samples' and the like which are at the edges of what the NCBI taxonomy cares about, but also for organisms at the core of classical taxonomy, like homo sapiens. And to be a bit more precise with what I mean by 'more loose', I thought 'alternative label'. |
Thanks for your comments Let's start with an NCBI request - I nominate @bpeters42 or @fbastian since you both have existing relationships. @hrshdhgd can go ahead and make a PR, but we will hold off on merging it until we are sure that NCBI won't remove it. |
We more closely into this, and the problem is probably on our end. We get our data from https://ftp.ncbi.nih.gov/pub/taxonomy/taxdmp.zip. One of the tables in there is
Here's an example row from
We want out rdfs:labels to be unique, so we use the unique name if it's present. But then we also create a synonym from the 'name_txt', and that might not be the right thing to do. This is the relevant bit of code: https://github.com/obophenotype/ncbitaxon/blob/master/src/ncbitaxon.py#L258 I'm too tired right now, but I'll come back to this tomorrow. |
No synonyms added if name = environmental samples #56
This issue was fixed by @hrshdhgd 2 years ago, am closing |
E.g. http://purl.obolibrary.org/obo/NCBITaxon_743727
Even though our transform generally does not alter the source I think in this case we need to filter this
The text was updated successfully, but these errors were encountered: