Skip to content

Commit

Permalink
ingest/ci: Use custom rules to manage example data
Browse files Browse the repository at this point in the history
Keeps the default rules simpler by hiding the use of example data
in the custom rules for CI.

This matches how we manage example data in the phylogenetic workflow.
  • Loading branch information
joverlee521 committed Jul 17, 2024
1 parent 567bccf commit b9386ad
Show file tree
Hide file tree
Showing 4 changed files with 18 additions and 14 deletions.
4 changes: 2 additions & 2 deletions ingest/build-configs/ci/config.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Use cached ncbi datasets package to speed up tests and isolate from ncbi servers
mock_fetch: true
custom_rules:
- "build-configs/ci/copy_example_data.smk"

# Snakemake requires at least one top level key in a config file, so including
# a bogus key here that should not be used anywhere in the Snakemake workflow
Expand Down
14 changes: 14 additions & 0 deletions ingest/build-configs/ci/copy_example_data.smk
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
rule copy_example_data:
input:
ncbi_dataset="example_data/ncbi_dataset.zip"
output:
ncbi_dataset=temp("data/ncbi_dataset.zip")
shell:
"""
cp -f {input.ncbi_dataset} {output.ncbi_dataset}
"""

# Add a Snakemake ruleorder directive here if you need to resolve ambiguous rules
# that have the same output as the copy_example_data rule.

ruleorder: copy_example_data > fetch_ncbi_dataset_package
File renamed without changes.
14 changes: 2 additions & 12 deletions ingest/rules/fetch_from_ncbi.smk
Original file line number Diff line number Diff line change
Expand Up @@ -45,18 +45,9 @@ rule fetch_ncbi_dataset_package:
"""


def get_ncbi_dataset_package_path():
"""
Use cached data package in ci to isolate from ncbi server
"""
if config.get("mock_fetch", False):
return "test_data/ncbi_dataset.zip"
return "data/ncbi_dataset.zip"


rule extract_ncbi_dataset_sequences:
input:
dataset_package=get_ncbi_dataset_package_path(),
dataset_package="data/ncbi_dataset.zip",
output:
ncbi_dataset_sequences=temp("data/ncbi_dataset_sequences.fasta"),
benchmark:
Expand All @@ -70,7 +61,7 @@ rule extract_ncbi_dataset_sequences:

rule format_ncbi_dataset_report:
input:
dataset_package=get_ncbi_dataset_package_path(),
dataset_package="data/ncbi_dataset.zip",
output:
ncbi_dataset_tsv=temp("data/ncbi_dataset_report.tsv"),
params:
Expand Down Expand Up @@ -116,4 +107,3 @@ rule format_ncbi_datasets_ndjson:
--duplicate-reporting warn \
2> {log} > {output.ndjson}
"""

0 comments on commit b9386ad

Please sign in to comment.