Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix parsing header lines where values are containing square bracket lists #107

Merged
merged 5 commits into from
Dec 8, 2024

Conversation

cmdcolin
Copy link
Contributor

@cmdcolin cmdcolin commented Dec 8, 2024

Two examples from the VCF 4.5 spec that demonstrate complex to 'parse' examples include

##META=<ID=Assay,Type=String,Number=.,Values=[WholeGenome, Exome]>


##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership, build 129">

both of these show that it is a little tricky to parse correctly, as you cannot just split the text within the <...> by comma, and then split by =...

To fix this, I used claude AI to help create the parseMetaString function. I first prompted the claude AI with a regex from here https://stackoverflow.com/a/18893443/2129219 which is quite complex already and told it to add support for the square brakets. It generated a non-working regex in the next step, but then i told it it didn't work and went into an infinite loop and it said, you're right, here is an alternative approach which is just a simple function that parses it out more precisely than the regex.

image

pretty smart. I credited the claude AI in the function

Copy link

codecov bot commented Dec 8, 2024

Codecov Report

Attention: Patch coverage is 98.46154% with 1 line in your changes missing coverage. Please review.

Project coverage is 94.72%. Comparing base (8d5c951) to head (dc4e328).
Report is 12 commits behind head on master.

Files with missing lines Patch % Lines
src/parse.ts 93.75% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #107      +/-   ##
==========================================
- Coverage   96.33%   94.72%   -1.61%     
==========================================
  Files           3        4       +1     
  Lines         218      683     +465     
  Branches       87      102      +15     
==========================================
+ Hits          210      647     +437     
- Misses          8       36      +28     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@cmdcolin cmdcolin force-pushed the parsing_fixes branch 3 times, most recently from dc4e328 to 9fef28a Compare December 8, 2024 20:35
@cmdcolin cmdcolin merged commit 5950b41 into master Dec 8, 2024
1 check passed
@cmdcolin cmdcolin changed the title Fix parsing header lines where values are containing commas Fix parsing header lines where values are containing square bracket lists Dec 8, 2024
@cmdcolin cmdcolin deleted the parsing_fixes branch December 8, 2024 21:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant