Create custom code to parse Measles strain names #14

kimandrews · 2024-02-07T18:37:08Z

As discussed, the WHO requires measles strain names to include the sampling date and geographic location, and in some cases, the strain names could be used to recover dates and/or geographic locations for samples that have empty or ambiguous values for these attributes in the NCBI Datasets program outputs. However, the WHO-formatted strain names do not always appear in the NCBI Datasets output because some GenBank submitters report strain names in the "isolate" field whereas others use the "strain" field, but the NCBI Datasets program only pulls the "isolate" field. The NCBI Datasets team has plans to add the "strain" field sometime this year. After that has been completed, custom code could be written to parse the NCBI Datasets output to do the following for each sample:

Determine whether WHO-formatted strain name is in the "isolate" or "strain" field
Parse date and geographic location from WHO-formatted strain name when these attributes are otherwise empty or ambiguous

This custom code may have minimal impact on the current measles workflow outputs, because very few samples that meet the minimum length requirement (5000bp) have missing dates that could be recovered by this approach. However, if we eventually create gene-specific phylogenies, more samples would be affected. In addition, this code would recover WHO-formatted strain names for many samples (because many samples have strain names in the "strain" field), and there is value in having these strain names present in the metadata retrieved for all samples.

kimandrews mentioned this issue Feb 8, 2024

Add ingest #10

Merged

kimandrews mentioned this issue Apr 12, 2024

Include vaccine strains #23

Merged

joverlee521 mentioned this issue Jun 12, 2024

Fixup: Add date annotations for rare genotypes #38

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create custom code to parse Measles strain names #14

Create custom code to parse Measles strain names #14

kimandrews commented Feb 7, 2024

Create custom code to parse Measles strain names #14

Create custom code to parse Measles strain names #14

Comments

kimandrews commented Feb 7, 2024