Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixup: Remove time information from Nextclade dataset tree #34

Merged
merged 3 commits into from
Jun 7, 2024

Conversation

kimandrews
Copy link
Contributor

@kimandrews kimandrews commented Jun 3, 2024

Description of proposed changes

The goal of this PR is to remove time information from the Nextclade dataset tree, since Nextclade doesn't use this information. This change was originally recommended in nextstrain/nextclade_data#202 (comment)

Initial efforts to accomplish this goal involved exporting the Nextclade tree using branch lengths in tree.nwk (rather than in branch_lengths.json), under the assumption that these were divergence branch lengths, but actually they are time branch lengths. Therefore we need a different approach to remove time information.

This PR first reverts the changes from the initial approach, and then removes all node attributes except divergence branch lengths (mutation_length attribute) from branch_lengths.json before exporting the Nextclade tree.

Related issue(s)

nextstrain/nextclade_data#202 (comment)
#33

Checklist

  • Checks pass

This reverts commit 4051dc0.
That commit was made under the assumption that the branch lengths in tree.nwk
were divergence lengths, but actually they are time lengths. So we need a 
different approach for removing time information from the Nextclade dataset tree.
Remove all node attributes except divergence branch lengths (which are listed in 
the `mutation_length` attribute) from branch_lengths.json
@kimandrews kimandrews requested a review from a team June 3, 2024 21:30
Date does not need to be included in auspice_config.json since only a divergence tree is generated.
Copy link
Contributor

@joverlee521 joverlee521 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to merge to update the measles Nextclade dataset tree.

I still think this is a bit of a "hacky" way to remove time info from the tree, but I won't block here. We will continue to run into similar issues as we develop more Nextclade datasets, so I think we should discuss a standard way to accomplish this in the long term. -> nextstrain/augur#1479

rule timeout:
input: "results/branch_lengths.json"
output: "results/branch_lengths_div_only.json"
run:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, the Nextstrain Snakemake style guide advises to avoid run blocks.

Copy link
Member

@jameshadfield jameshadfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just spoke with Kim and the latest nextclade tree was build using this code. LGTM

@kimandrews kimandrews merged commit 8d5e4ae into main Jun 7, 2024
32 checks passed
@kimandrews kimandrews deleted the fixup-remove-time-from-nextclade-tree branch June 7, 2024 20:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants