Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add phylogenetic directory #18

Merged
merged 10 commits into from
Mar 1, 2024
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 21 additions & 2 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,24 @@ on:
- pull_request

jobs:
ci:
uses: nextstrain/.github/.github/workflows/pathogen-repo-ci.yaml@master
pathogen-ci:
strategy:
matrix:
runtime: [docker, conda]
permissions:
id-token: write
uses: nextstrain/.github/.github/workflows/pathogen-repo-build.yaml@master
secrets: inherit
with:
runtime: ${{ matrix.runtime }}
run: |
nextstrain build \
phylogenetic \
--configfile build-configs/ci/config.yaml
artifact-name: output-${{ matrix.runtime }}
artifact-paths: |
phylogenetic/auspice/
phylogenetic/results/
phylogenetic/benchmarks/
phylogenetic/logs/
phylogenetic/.snakemake/log/
74 changes: 16 additions & 58 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,67 +1,25 @@
# nextstrain.org/measles
# Nextstrain repository for measles virus

This is the [Nextstrain](https://nextstrain.org) build for measles virus, visible at
[nextstrain.org/measles](https://nextstrain.org/measles).
This repository contains two workflows for the analysis of measles virus data:

The build encompasses fetching data, preparing it for analysis, doing quality
control, performing analyses, and saving the results in a format suitable for
visualization (with [auspice][]). This involves running components of
Nextstrain such as [augur][].
- [`ingest/`](./ingest) - Download data from GenBank, clean and curate it
- [`phylogenetic/`](./phylogenetic) - Filter sequences, align, construct phylogeny and export for visualization

All measles-specific steps and functionality for the Nextstrain pipeline should be
housed in this repository.
Each folder contains a README.md with more information. The results of running both workflows are publicly visible at [nextstrain.org/measles](https://nextstrain.org/measles).

[![Build Status](https://github.com/nextstrain/measles/actions/workflows/ci.yaml/badge.svg?branch=main)](https://github.com/nextstrain/measles/actions/workflows/ci.yaml)
## Installation

## Usage
Follow the [standard installation instructions](https://docs.nextstrain.org/en/latest/install.html) for Nextstrain's suite of software tools.

If you're unfamiliar with Nextstrain builds, you may want to follow our
[quickstart guide][] first and then come back here.
## Quickstart

The easiest way to run this pathogen build is using the [Nextstrain
command-line tool][nextstrain-cli]:
Run the default phylogenetic workflow via:
```
cd phylogenetic/
nextstrain build .
nextstrain view .
```

nextstrain build .
## Documentation

See the [nextstrain-cli README][] for how to install the `nextstrain` command.

Alternatively, you should be able to run the build using `snakemake` within a
suitably-configured local environment. Details of setting that up are not yet
well-documented, but will be in the future.

Build output goes into the directories `data/`, `results/` and `auspice/`.

Once you've run the build, you can view the results in auspice:

nextstrain view auspice/


## Configuration

Configuration takes place entirely with the `Snakefile`. This can be read top-to-bottom, each rule
specifies its file inputs and output and also its parameters. There is little redirection and each
rule should be able to be reasoned with on its own.

<!--
### fauna / RethinkDB credentials

This build starts by pulling sequences from our live [fauna][] database (a RethinkDB instance). This
requires environment variables `RETHINK_HOST` and `RETHINK_AUTH_KEY` to be set.
-->

If you don't have access to our https endpoints, you can run the build using the
example data provided in this repository. Before running the build, copy the
example sequences into the `data/` directory like so:

mkdir -p data/
cp example_data/* data/.


[Nextstrain]: https://nextstrain.org
<!-- [fauna]: https://github.com/nextstrain/fauna -->
[augur]: https://github.com/nextstrain/augur
[auspice]: https://github.com/nextstrain/auspice
[snakemake cli]: https://snakemake.readthedocs.io/en/stable/executable.html#all-options
[nextstrain-cli]: https://github.com/nextstrain/cli
[nextstrain-cli README]: https://github.com/nextstrain/cli/blob/master/README.md
[quickstart guide]: https://nextstrain.org/docs/getting-started/quickstart
- [Running a pathogen workflow](https://docs.nextstrain.org/en/latest/tutorials/running-a-workflow.html)
199 changes: 0 additions & 199 deletions Snakefile

This file was deleted.

5 changes: 5 additions & 0 deletions nextstrain-pathogen.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# This is currently an empty file to indicate the top level pathogen repo.
# The inclusion of this file allows the Nextstrain CLI to run the
# `nextstrain build` from any directory regardless of runtime.
#
# See https://github.com/nextstrain/cli/releases/tag/8.2.0 for more details.
50 changes: 50 additions & 0 deletions phylogenetic/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# nextstrain.org/measles

This is the [Nextstrain](https://nextstrain.org) build for measles, visible at
[nextstrain.org/measles](https://nextstrain.org/measles).

## Software requirements

Follow the [standard installation instructions](https://docs.nextstrain.org/en/latest/install.html)
for Nextstrain's suite of software tools.

## Usage

If you're unfamiliar with Nextstrain builds, you may want to follow our
[Running a Pathogen Workflow guide](https://docs.nextstrain.org/en/latest/tutorials/running-a-workflow.html) first and then come back here.

The easiest way to run this pathogen build is using the Nextstrain
command-line tool from within the `phylogenetic/` directory:

cd phylogenetic/
nextstrain build .

Build output goes into the directories `data/`, `results/` and `auspice/`.

Once you've run the build, you can view the results with:

nextstrain view .

## Configuration

Configuration takes place entirely with the `Snakefile`. This can be read
top-to-bottom, each rule specifies its file inputs and output and also its
parameters. There is little redirection and each rule should be able to be
reasoned with on its own.

### Using GenBank data

This build starts by pulling preprocessed sequence and metadata files from:

* https://data.nextstrain.org/files/measles/sequences.fasta.zst
* https://data.nextstrain.org/files/measles/metadata.tsv.zst

The above datasets have been preprocessed and cleaned from GenBank.

### Using example data

Alternatively, you can run the build using the
example data provided in this repository. To run the build by copying the
example sequences into the `data/` directory, use the following:

nextstrain build . --configfile profiles/ci/profiles_config.yaml
24 changes: 24 additions & 0 deletions phylogenetic/Snakefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
configfile: "defaults/config.yaml"

rule all:
input:
auspice_json = "auspice/measles.json",

include: "rules/prepare_sequences.smk"
include: "rules/construct_phylogeny.smk"
include: "rules/annotate_phylogeny.smk"
include: "rules/export.smk"

# Include custom rules defined in the config.
if "custom_rules" in config:
for rule_file in config["custom_rules"]:

include: rule_file

rule clean:
"""Removing directories: {params}"""
params:
"results ",
"auspice"
shell:
"rm -rfv {params}"
Loading