Proposal: Function for optimized retrieval of GT from FORMAT field #105
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
fully parsing out the "FORMAT" field requires a lot of complex memory structures (Object containing Object containing arrays of strings or numbers or ...). this can cause out of memory and lots of garbage collection during parsing large 1000 genomes type data.
this PR proposes a simplified representation called parseGenotypesOptimized
this PR is connected to efforts here, which parses large regions of 1000 genomes type data GMOD/jbrowse-components#4511
It also makes
Footnote: overrides #94 probably. In #94, I was very committed to preserving the notion of what the existing Variant class was, but this PR changes it. as a result, it's a major version bump.
fixed #98