-
Notifications
You must be signed in to change notification settings - Fork 403
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve / standardize the way that we pass through default parameters and build-specific parameters #1131
Comments
Per rationale in issue 1131 (#1131), switch to following a three step look up for setting build-specific parameters of colors:clade_recency.
Voting for strategy 2. Reasoning: In theory, allowing arbitrary build names under keys such as Practically speaking, there may be little chance that a user would want to name a build using one of the reserved names. But it'd be easier to keep the exception list small by using |
I would also vote to keep strategy 2. I think having an explicit Using the Without the frequencies:
min_date: 2020-01-01
max_date: 2020-05-10
pivot_interval: 4
pivot_interval: 1
pivot_interval_units: "weeks"
narrow_bandwidth: 0.041
proportion_wide: 0.0 With the frequencies:
default:
min_date: 2020-01-01
max_date: 2020-05-10
pivot_interval: 4
pivot_interval: 1
pivot_interval_units: "weeks"
narrow_bandwidth: 0.041
proportion_wide: 0.0 |
Good example! I would go one step further and make sure that all parameters can be made build-specific: frequencies:
default:
min_date: 2020-01-01
max_date: 2020-05-10
pivot_interval: 4
pivot_interval: 1
pivot_interval_units: "weeks"
narrow_bandwidth: 0.041
proportion_wide: 0.0
# every other entry is a build Or if there is good reason to prevent them from being set on the build level, clearly mark them as applying to all builds via a separate namespace: frequencies:
default:
min_date: 2020-01-01
max_date: 2020-05-10
all-builds:
pivot_interval: 4
pivot_interval: 1
pivot_interval_units: "weeks"
narrow_bandwidth: 0.041
proportion_wide: 0.0
# every other entry is a build |
This is essentially the solution I came to for clock_rates:
FALLBACK: # anything not specified by a subtype/time combination
pb2: '' # falls back to FALLBACK, and the empty string means no
pb1: '' # supplied clock rate, i.e. infer the clock
...
h5nx:
2y:
pb2: [0.00287, *clock_std_dev]
pb1: [0.00267, *clock_std_dev]
... Solution 3 is the same, it just defines default values and build names in the same namespace/dictionary. I'd be happy with consistent use of either (2) or (3).
We should really prevent this from happening. Schemas are a (the?) way to identify this as an invalid config file and error out early. I don't think we've written any schema files for Snakemake, but we make extensive use of them for augur/auspice JSONs. |
Per rationale in issue 1131 (#1131), switch to following a three step look up for setting build-specific parameters of colors:clade_recency.
Thanks for the feedback here @joverlee521, @victorlin and @jameshadfield. Based on your preferences I updated |
This follows recommendation in issue #1131 to use the pattern frequencies: default: min_date: "6M" narrow_bandwidth: 0.038 This use of "default" extends just to min_date, max_date and narrow_bandwidth. This behavior is now documented in parameters.yaml as well as workflow-config-file.rst. The specification of frequencies parameters in builds.yaml now follows this pattern.
This follows recommendation in issue #1131 to use the pattern frequencies: default: min_date: "6M" narrow_bandwidth: 0.038 This use of "default" extends just to min_date, max_date and narrow_bandwidth. This behavior is now documented in parameters.yaml as well as workflow-config-file.rst. The specification of frequencies parameters in builds.yaml now follows this pattern.
This follows recommendation in issue #1131 to use the pattern frequencies: default: min_date: "6M" narrow_bandwidth: 0.038 This use of "default" extends just to min_date, max_date and narrow_bandwidth. This behavior is now documented in parameters.yaml as well as workflow-config-file.rst. The specification of frequencies parameters in builds.yaml now follows this pattern.
Per rationale in issue 1131 (#1131), switch to following a three step look up for setting build-specific parameters of colors:clade_recency.
We should absolutely be making greater use of schema-based validation for our workflow configs. The only example of us doing so that I'm aware of is ncov, which provides a very basic, non-comprehensive schema (better than nothing, but not great). |
Context
Currently, the workflow is a bit of mess when it comes to how one can specify parameters. There seems to be 3 different approaches:
1 – parameters.yaml only
For the
frequencies
rule, the parameterpivot_interval
is retrieved viaconfig["frequencies"]["pivot_interval"]
. This makes it grab exactly what's inparameters.yaml
underfrequencies:pivot_interval
. If someone tries to specify a 4-week build-specificpivot_interval
viaThe workflow will secretly just keep using the
pivot_interval
specified inparamaters.yaml
.Most parameters in the workflow work like this.
2 – builds.yaml override with necessity of
default
For the
traits
rule, the parametersampling_bias_correction
is retrieved via_get_sampling_bias_correction_for_wildcards
which is defined asIe it first looks for a build-specific list of
traits:{build_name}:sampling_bias_correction
and if doesn't find it, it expectstraits:default:sampling_bias_correction
inbuilds.yaml
orparameters.yaml
. This is described in the docs asThis works, but requires
parameters.yaml
to look like:with an extra
default
key compared to other entries inparameters.yaml
.This strategy is only used for the
traits
rule.3 – builds.yaml override without
default
For the
frequencies
rule, the parametermin_date
is retrieved via_get_min_date_for_frequencies
which is defined asIe it starts with trying to grab build-specific
frequencies:{build_name}:min_date
frombuilds.yaml
. If this doesn't exist, it looks forfrequencies:min_date
inbuilds.yaml
orparameters.yaml
and if this doesn't exist it directly returns a sensible default.This strategy is only used for the
frequencies
rule.Description
I believe that we should replace the strategy 2 above used only in
traits
rule with a setup like 3 above. This is what I did when I realized we weren't collectingnarrow_bandwidth
properly. In PR #1130 I followed strategy 3 to specifynarrow_bandwidth
as:We have the issue that if we swap
_get_sampling_bias_correction_for_wildcards
to use strategy 3, we'll need to updateparameters.yaml
to readThis will break custom profiles that external users are running. We could provide backwards compatibility however by looking first for
traits:sampling_bias_correction
and then fortraits:defaults:sampling_bias_correction
.Does this seem reasonable?
The text was updated successfully, but these errors were encountered: