-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixup: Remove time information from Nextclade dataset tree #34
Conversation
This reverts commit 4051dc0. That commit was made under the assumption that the branch lengths in tree.nwk were divergence lengths, but actually they are time lengths. So we need a different approach for removing time information from the Nextclade dataset tree.
Remove all node attributes except divergence branch lengths (which are listed in the `mutation_length` attribute) from branch_lengths.json
Date does not need to be included in auspice_config.json since only a divergence tree is generated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to merge to update the measles Nextclade dataset tree.
I still think this is a bit of a "hacky" way to remove time info from the tree, but I won't block here. We will continue to run into similar issues as we develop more Nextclade datasets, so I think we should discuss a standard way to accomplish this in the long term. -> nextstrain/augur#1479
rule timeout: | ||
input: "results/branch_lengths.json" | ||
output: "results/branch_lengths_div_only.json" | ||
run: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, the Nextstrain Snakemake style guide advises to avoid run blocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just spoke with Kim and the latest nextclade tree was build using this code. LGTM
Description of proposed changes
The goal of this PR is to remove time information from the Nextclade dataset tree, since Nextclade doesn't use this information. This change was originally recommended in nextstrain/nextclade_data#202 (comment)
Initial efforts to accomplish this goal involved exporting the Nextclade tree using branch lengths in tree.nwk (rather than in branch_lengths.json), under the assumption that these were divergence branch lengths, but actually they are time branch lengths. Therefore we need a different approach to remove time information.
This PR first reverts the changes from the initial approach, and then removes all node attributes except divergence branch lengths (
mutation_length
attribute) from branch_lengths.json before exporting the Nextclade tree.Related issue(s)
nextstrain/nextclade_data#202 (comment)
#33
Checklist