Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate sssom validation to new LinkML validation framework #532

Merged
merged 14 commits into from
Jul 26, 2024

Conversation

matentzn
Copy link
Collaborator

@matentzn matentzn commented May 16, 2024

Blocked by:

Note to future self, only thing that stands in the way of this PR now is #2164, if that is dealt with

  • change print_linkml_report(report, False) to print_linkml_report(report, fail_on_error)
# TODO fail_on_error: False because of https://github.com/linkml/linkml/issues/2164
    print_linkml_report(report, False)

tests/data/mondo-nando.sssom.tsv Show resolved Hide resolved
tests/data/mondo-nando.sssom.tsv Show resolved Hide resolved
tests/test_validate.py Outdated Show resolved Hide resolved
tests/test_validate.py Show resolved Hide resolved
src/sssom/validators.py Outdated Show resolved Hide resolved
src/sssom/validators.py Outdated Show resolved Hide resolved
src/sssom/validators.py Outdated Show resolved Hide resolved
@matentzn matentzn changed the title This test file should not pass validation, but does Migrate sssom validation to new LinkML validation framework May 17, 2024
@matentzn
Copy link
Collaborator Author

@pkalita-lbl Thanks a ton for your help, I made issues for everything and will resume work here once they are resolved. Thank you!

matentzn added 6 commits June 15, 2024 08:56
Using the latest version of LinkML to get latest validator features.
In this commit we

1. Add a new parameter, fail_on_error, to all validation methods (if true, throws and exception when errors are encountered)
2. We generate ValidationResult instances fall all custom checks (reusing the LinkML validation framework)
3. Add a few missing docstrings
4. Add a new validation check, check_strict_curie_format, which looks for curies with a | character in it
src/sssom/validators.py Outdated Show resolved Hide resolved
matentzn added 2 commits July 4, 2024 15:59
Previously I used dataclasses. asdict to turn a dataclass instance to a dictionary to pass to the validation framework. @pkalita-lbl explained to me why this was not a good idea (essentially, dataclass instances dont look like what the validation framework expects), and recommended the use of linkml_runtime.dumpers.json_dumper instead. Tests work fine!
.. to ensure that it recognises the argument fail_on_error.
Copy link

@pkalita-lbl pkalita-lbl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validation-related changes look good to me!

@matentzn matentzn requested a review from hrshdhgd July 23, 2024 09:37
@matentzn
Copy link
Collaborator Author

@hrshdhgd I assigned this to you for review. I think it is done, but I am a bit scared of how the changes will impact people relying on validation downstream, so I prefer to have your eyes on it. Thank you! Ideally this should be a careful review beyond "lgtm" as I feel I cant afford messing this up!

Copy link
Contributor

@hrshdhgd hrshdhgd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the idea here is if validation encounters a SSSOM tsv file does not contain (say) a valid column value, the row is dropped and warning is logged. Correct?

It will never error out by the presence of invalid elements?

@matentzn
Copy link
Collaborator Author

@hrshdhgd just to be sure I need a positive approval from you for this PR, and real certainty that all is understood and correct. Can you comment on the exact pieces of code you have questions about?

@hrshdhgd
Copy link
Contributor

real certainty that all is understood and correct.

That is exactly what I'm trying to do as per my previous comment.

@matentzn
Copy link
Collaborator Author

So the idea here is if validation encounters a SSSOM tsv file does not contain (say) a valid column value, the row is dropped and warning is logged. Correct?

I hope this PR is only affecting the sssom validate command, and no other functionality. Did you see any possibility of this PR affecting any other methods, like sssom parse? That would make me even more cautious :D

Copy link
Contributor

@hrshdhgd hrshdhgd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it seems like parse would have no impact with this change of code.

@matentzn matentzn merged commit 0c74f8d into master Jul 26, 2024
6 checks passed
@matentzn matentzn deleted the nando-test branch July 26, 2024 16:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants