-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New method for semsim ingestion #215
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very, very nice looking code. I would prefer if you could get a @yaseminbridges review as well!
I have no concerns, just a few questions for my own learnings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These .bkp files are used during the test execution. After ingesting data into the H2 database, we need to copy this backup and restore the database's state after the tests are over.
testdata/phenotype/2302_phenotype/2302_phenotype_test.trace.db.bkp
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It generally looks good, I just have some comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I just had one comment on your _format_row()
method. There was a TODO note but I'm not sure if this is something you planned to change in a future PR
data (_type_): row data | ||
""" | ||
# TODO:Improve string escaping. Replace this code with parametrised query | ||
return f"""({mapping_id}, '{data['subject_id']}', '{data['subject_label'].replace("'", "")}', '{data['object_id']}', '{data['object_label'].replace("'", "")}', {data['jaccard_similarity']}, {data['ancestor_information_content']}, {data['phenodigm_score']}, '{data['ancestor_id'].split(",")[0]}', '{data['ancestor_label'].replace("'", "")}')""" # noqa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be changed so that the code looks a bit neater, this is a bit confusing to read
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be changed so that the code looks a bit neater, this is a bit confusing to read
Yes, I'd like to use an ORM for escaping SQL, but it will be implemented in a future PR…
Fixes #214