Skip to content

Commit

Permalink
Merge branch 'dsl2-add-sharding-of-fastqs-before-alignment' of https:…
Browse files Browse the repository at this point in the history
…//github.com/nf-core/eager into dsl2-add-sharding-of-fastqs-before-alignment
  • Loading branch information
shyama-mama committed Aug 25, 2023
2 parents d7a711d + fe408ea commit 29a3b3a
Show file tree
Hide file tree
Showing 40 changed files with 848 additions and 719 deletions.
1 change: 0 additions & 1 deletion .github/CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,4 +116,3 @@ To get started:
Devcontainer specs:

- [DevContainer config](.devcontainer/devcontainer.json)
- [Dockerfile](.devcontainer/Dockerfile)
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/bug_report.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ body:
attributes:
label: System information
description: |
* Nextflow version _(eg. 22.10.1)_
* Nextflow version _(eg. 23.04.0)_
* Hardware _(eg. HPC, Desktop, Cloud)_
* Executor _(eg. slurm, local, awsbatch)_
* Container engine: _(e.g. Docker, Singularity, Conda, Podman, Shifter, Charliecloud, or Apptainer)_
Expand Down
11 changes: 8 additions & 3 deletions .github/workflows/awsfulltest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,21 +14,26 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Launch workflow via tower
uses: seqeralabs/action-tower-launch@v1
uses: seqeralabs/action-tower-launch@v2
# TODO nf-core: You can customise AWS full pipeline tests as required
# Add full size test data (but still relatively small datasets for few samples)
# on the `test_full.config` test runs with only one set of parameters
with:
workspace_id: ${{ secrets.TOWER_WORKSPACE_ID }}
access_token: ${{ secrets.TOWER_ACCESS_TOKEN }}
compute_env: ${{ secrets.TOWER_COMPUTE_ENV }}
revision: ${{ github.sha }}
workdir: s3://${{ secrets.AWS_S3_BUCKET }}/work/eager/work-${{ github.sha }}
parameters: |
{
"hook_url": "${{ secrets.MEGATESTS_ALERTS_SLACK_HOOK_URL }}",
"outdir": "s3://${{ secrets.AWS_S3_BUCKET }}/eager/results-${{ github.sha }}"
}
profiles: test_full,aws_tower
profiles: test_full

- uses: actions/upload-artifact@v3
with:
name: Tower debug log file
path: tower_action_*.log
path: |
tower_action_*.log
tower_action_*.json
10 changes: 7 additions & 3 deletions .github/workflows/awstest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,18 +12,22 @@ jobs:
steps:
# Launch workflow using Tower CLI tool action
- name: Launch workflow via tower
uses: seqeralabs/action-tower-launch@v1
uses: seqeralabs/action-tower-launch@v2
with:
workspace_id: ${{ secrets.TOWER_WORKSPACE_ID }}
access_token: ${{ secrets.TOWER_ACCESS_TOKEN }}
compute_env: ${{ secrets.TOWER_COMPUTE_ENV }}
revision: ${{ github.sha }}
workdir: s3://${{ secrets.AWS_S3_BUCKET }}/work/eager/work-${{ github.sha }}
parameters: |
{
"outdir": "s3://${{ secrets.AWS_S3_BUCKET }}/eager/results-test-${{ github.sha }}"
}
profiles: test,aws_tower
profiles: test

- uses: actions/upload-artifact@v3
with:
name: Tower debug log file
path: tower_action_*.log
path: |
tower_action_*.log
tower_action_*.json
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ jobs:
strategy:
matrix:
NXF_VER:
- "22.10.1"
- "23.04.0"
- "latest-everything"
PARAMS:
- "-profile test,docker --preprocessing_tool fastp --preprocessing_adapterlist 'https://github.com/nf-core/test-datasets/raw/modules/data/delete_me/fastp/adapters.fasta'"
Expand Down
5 changes: 5 additions & 0 deletions .gitpod.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
image: nfcore/gitpod:latest
tasks:
- name: Update Nextflow and setup pre-commit
command: |
pre-commit install --install-hooks
nextflow self-update
vscode:
extensions: # based on nf-core.nf-core-extensionpack
Expand Down
13 changes: 13 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@

- [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)

> Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online]. Available online https://www.bioinformatics.babraham.ac.uk/projects/fastqc/.
- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/)

> Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.
Expand Down Expand Up @@ -72,10 +74,18 @@

> Bushnell B, Rood J, Singer E (2017) BBMerge – Accurate paired shotgun read merging via overlap. PLOS ONE 12(10): e0185056. [https://doi.org/10.1371/journal.pone.0185056](https://doi.org/10.1371/journal.pone.0185056)
- [BEDTools](https://doi.org/10.1093/bioinformatics/btq033)

> Quinlan, A. R., & Hall, I. M. (2010). BEDTools: A flexible suite of utilities for comparing genomic features. Bioinformatics, 26(6), 841–842. [https://doi.org/10.1093/bioinformatics/btq033](https://doi.org/10.1093/bioinformatics/btq033)
- [PreSeq](https://doi.org/10.1038/nmeth.2375)

> Daley, T., & Smith, A. D. (2013). Predicting the molecular complexity of sequencing libraries. Nature Methods, 10(4), 325–327. doi: [10.1038/nmeth.2375](https://doi.org/10.1038/nmeth.2375)
- [endorS.py](https://doi.org/10.7717/peerj.10947)

> Fellows Yates JA, Lamnidis TC, Borry M, Valtueña Andrades A, Fagernäs Z, Clayton S, Garcia MU, Neukamm J, Peltzer A. 2021. Reproducible, portable, and efficient ancient genome reconstruction with nf-core/eager. PeerJ 9:e10947. doi: [10.7717/peerj.10947](https://doi.org/10.7717/peerj.10947)
- [mapDamage2](https://doi.org/10.1093/bioinformatics/btt193)

> Jónsson H, Ginolhac A, Schubert M, Johnson P, Orlando L. mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters. Bioinformatics 2013. 23rd April 2013. doi: [10.1093/bioinformatics/btt193](https://doi.org/10.1093/bioinformatics/btt193)
Expand Down Expand Up @@ -107,5 +117,8 @@
- [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241)

> Merkel, D. (2014). Docker: lightweight linux containers for consistent development and deployment. Linux Journal, 2014(239), 2. doi: 10.5555/2600239.2600241.
- [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/)

> Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

[![AWS CI](https://img.shields.io/badge/CI%20tests-full%20size-FF9900?labelColor=000000&logo=Amazon%20AWS)](https://nf-co.re/eager/results)[![Cite with Zenodo](http://img.shields.io/badge/DOI-10.5281/zenodo.XXXXXXX-1073c8?labelColor=000000)](https://doi.org/10.5281/zenodo.XXXXXXX)

[![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-%E2%89%A522.10.1-23aa62.svg)](https://www.nextflow.io/)
[![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-%E2%89%A523.04.0-23aa62.svg)](https://www.nextflow.io/)
[![run with conda](http://img.shields.io/badge/run%20with-conda-3EB049?labelColor=000000&logo=anaconda)](https://docs.conda.io/en/latest/)
[![run with docker](https://img.shields.io/badge/run%20with-docker-0db7ed?labelColor=000000&logo=docker)](https://www.docker.com/)
[![run with singularity](https://img.shields.io/badge/run%20with-singularity-1d355c.svg?labelColor=000000)](https://sylabs.io/docs/)
Expand Down Expand Up @@ -117,11 +117,11 @@ nextflow run nf-core/eager \
> provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_;
> see [docs](https://nf-co.re/usage/configuration#custom-configuration-files).
For more details, please refer to the [usage documentation](https://nf-co.re/eager/usage) and the [parameter documentation](https://nf-co.re/eager/parameters).
For more details and further functionality, please refer to the [usage documentation](https://nf-co.re/eager/usage) and the [parameter documentation](https://nf-co.re/eager/parameters).

## Pipeline output

To see the the results of a test run with a full size dataset refer to the [results](https://nf-co.re/eager/results) tab on the nf-core website pipeline page.
To see the results of an example test run with a full size dataset refer to the [results](https://nf-co.re/eager/results) tab on the nf-core website pipeline page.
For more details about the output files and reports, please refer to the
[output documentation](https://nf-co.re/eager/output).

Expand Down
12 changes: 8 additions & 4 deletions assets/methods_description_template.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,21 @@ description: "Suggested text and references to use when describing pipeline usag
section_name: "nf-core/eager Methods Description"
section_href: "https://github.com/nf-core/eager"
plot_type: "html"
## TODO nf-core: Update the HTML below to your prefered methods description, e.g. add publication citation for this pipeline
## TODO nf-core: Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline
## You inject any metadata in the Nextflow '${workflow}' object
data: |
<h4>Methods</h4>
<p>Data was processed using nf-core/eager v${workflow.manifest.version} ${doi_text} of the nf-core collection of workflows (<a href="https://doi.org/10.1038/s41587-020-0439-x">Ewels <em>et al.</em>, 2020</a>).</p>
<p>Data was processed using nf-core/eager v${workflow.manifest.version} ${doi_text} of the nf-core collection of workflows (<a href="https://doi.org/10.1038/s41587-020-0439-x">Ewels <em>et al.</em>, 2020</a>), utilising reproducible software environments from the Bioconda (<a href="https://doi.org/10.1038/s41592-018-0046-7">Grüning <em>et al.</em>, 2018</a>) and Biocontainers (<a href="https://doi.org/10.1093/bioinformatics/btx192">da Veiga Leprevost <em>et al.</em>, 2017</a>) projects.</p>
<p>The pipeline was executed with Nextflow v${workflow.nextflow.version} (<a href="https://doi.org/10.1038/nbt.3820">Di Tommaso <em>et al.</em>, 2017</a>) with the following command:</p>
<pre><code>${workflow.commandLine}</code></pre>
<p>${tool_citations}</p>
<h4>References</h4>
<ul>
<li>Di Tommaso, P., Chatzou, M., Floden, E. W., Barja, P. P., Palumbo, E., & Notredame, C. (2017). Nextflow enables reproducible computational workflows. Nature Biotechnology, 35(4), 316-319. <a href="https://doi.org/10.1038/nbt.3820">https://doi.org/10.1038/nbt.3820</a></li>
<li>Ewels, P. A., Peltzer, A., Fillinger, S., Patel, H., Alneberg, J., Wilm, A., Garcia, M. U., Di Tommaso, P., & Nahnsen, S. (2020). The nf-core framework for community-curated bioinformatics pipelines. Nature Biotechnology, 38(3), 276-278. <a href="https://doi.org/10.1038/s41587-020-0439-x">https://doi.org/10.1038/s41587-020-0439-x</a></li>
<li>Di Tommaso, P., Chatzou, M., Floden, E. W., Barja, P. P., Palumbo, E., & Notredame, C. (2017). Nextflow enables reproducible computational workflows. Nature Biotechnology, 35(4), 316-319. doi: <a href="https://doi.org/10.1038/nbt.3820">10.1038/nbt.3820</a></li>
<li>Ewels, P. A., Peltzer, A., Fillinger, S., Patel, H., Alneberg, J., Wilm, A., Garcia, M. U., Di Tommaso, P., & Nahnsen, S. (2020). The nf-core framework for community-curated bioinformatics pipelines. Nature Biotechnology, 38(3), 276-278. doi: <a href="https://doi.org/10.1038/s41587-020-0439-x">10.1038/s41587-020-0439-x</a></li>
<li>Grüning, B., Dale, R., Sjödin, A., Chapman, B. A., Rowe, J., Tomkins-Tinch, C. H., Valieris, R., Köster, J., & Bioconda Team. (2018). Bioconda: sustainable and comprehensive software distribution for the life sciences. Nature Methods, 15(7), 475–476. doi: <a href="https://doi.org/10.1038/s41592-018-0046-7">10.1038/s41592-018-0046-7</a></li>
<li>da Veiga Leprevost, F., Grüning, B. A., Alves Aflitos, S., Röst, H. L., Uszkoreit, J., Barsnes, H., Vaudel, M., Moreno, P., Gatto, L., Weber, J., Bai, M., Jimenez, R. C., Sachsenberg, T., Pfeuffer, J., Vera Alvarez, R., Griss, J., Nesvizhskii, A. I., & Perez-Riverol, Y. (2017). BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics (Oxford, England), 33(16), 2580–2582. doi: <a href="https://doi.org/10.1093/bioinformatics/btx192">10.1093/bioinformatics/btx192</a></li>
${tool_bibliography}
</ul>
<div class="alert alert-info">
<h5>Notes:</h5>
Expand Down
4 changes: 2 additions & 2 deletions assets/multiqc_config.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
report_comment: >
This report has been generated by the <a href="https://github.com/nf-core/eager" target="_blank">nf-core/eager</a>
This report has been generated by the <a href="https://github.com/nf-core/eager/3.0.0dev" target="_blank">nf-core/eager</a>
analysis pipeline. For information about how to interpret these results, please see the
<a href="https://nf-co.re/eager" target="_blank">documentation</a>.
<a href="https://nf-co.re/eager/3.0.0dev/output" target="_blank">documentation</a>.
report_section_order:
"nf-core-eager-methods-description":
order: -1000
Expand Down
Binary file modified assets/nf-core-eager_logo_light.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion assets/slackreport.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
{
"fallback": "Plain-text summary of the attachment.",
"color": "<% if (success) { %>good<% } else { %>danger<%} %>",
"author_name": "sanger-tol/readmapping v${version} - ${runName}",
"author_name": "nf-core/eager v${version} - ${runName}",
"author_icon": "https://www.nextflow.io/docs/latest/_static/favicon.ico",
"text": "<% if (success) { %>Pipeline completed successfully!<% } else { %>Pipeline completed with errors<% } %>",
"fields": [
Expand Down
14 changes: 7 additions & 7 deletions bin/print_x_contamination.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,10 @@
def make_float(x):
# print (x)
output = [None for i in range(len(x))]
## If value for an estimate/error is -nan, replace with "NA". JSON does not accept NaN as a valid field.
## If value for an estimate/error is -nan, replace with "None". JSON does not accept NaN as a valid field.
for i in range(len(x)):
if x[i] == "-nan" or x[i] == "nan":
output[i] = "N/A"
output[i] = None
continue
try:
output[i] = float(x[i])
Expand Down Expand Up @@ -45,11 +45,11 @@ def make_float(x):
file=output,
)
for fn in Input_files:
## For each file, reset the values to "N/A" so they don't carry over from last file.
mom1, err_mom1 = "N/A", "N/A"
ml1, err_ml1 = "N/A", "N/A"
mom2, err_mom2 = "N/A", "N/A"
ml2, err_ml2 = "N/A", "N/A"
## For each file, reset the values to "None" so they don't carry over from last file.
mom1, err_mom1 = None, None
ml1, err_ml1 = None, None
mom2, err_mom2 = None, None
ml2, err_ml2 = None, None
nSNPs = "0"
with open(fn, "r") as f:
Estimates = {}
Expand Down
57 changes: 56 additions & 1 deletion conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -258,6 +258,20 @@ process {
]
}

//
// BAM INPUT
//
withName: 'SAMTOOLS_FLAGSTATS_BAM_INPUT' {
// TODO Once a lane-merging step is added for input BAMs, the lane should be dropped from this tag.
tag = { "${meta.reference}|${meta.sample_id}_${meta.library_id}_L${meta.lane}" }
ext.prefix = { "${meta.sample_id}_${meta.library_id}_${meta.lane}_${meta.reference}" }
publishDir = [
path: { "${params.outdir}/bam_input_stats/" },
mode: params.publish_dir_mode,
pattern: '*.flagstat'
]
}

//
// BAM FILTERING
//
Expand Down Expand Up @@ -381,6 +395,16 @@ process {
]
}

withName: ENDORSPY {
tag = { "${meta.reference}|${meta.sample_id}_${meta.library_id}" }
ext.prefix = { "${meta.sample_id}_${meta.library_id}_${meta.reference}" }
publishDir = [
path: { "${params.outdir}/mapstats/endorspy" },
mode: params.publish_dir_mode,
pattern: '*.json'
]
}

withName: ".*MAP:FASTQ_ALIGN_BWAALN:SAMTOOLS_INDEX" {
tag = { "${meta.reference}|${meta.sample_id}_${meta.library_id}_L${meta.lane}" }
ext.args = { params.fasta_largeref ? "-c" : "" }
Expand Down Expand Up @@ -602,6 +626,37 @@ process {
]
}

//
// BEDTOOLS_COVERAGE
//
withName: SAMTOOLS_VIEW_GENOME {
tag = { "${meta.reference}|${meta.sample_id}_${meta.library_id}" }
publishDir = [
enabled: false
]
}

withName: BEDTOOLS_COVERAGE_DEPTH {
tag = { "${meta.reference}|${meta.sample_id}_${meta.library_id}" }
ext.args = '-mean -nonamecheck'
ext.prefix = { "${meta.sample_id}_${meta.library_id}_${meta.reference}_depth" }
publishDir = [
path: { "${params.outdir}/mapstats/bedtools" },
mode: params.publish_dir_mode
]
}

withName: BEDTOOLS_COVERAGE_BREADTH {
tag = { "${meta.reference}|${meta.sample_id}_${meta.library_id}" }
ext.args = '-nonamecheck'
ext.prefix = { "${meta.sample_id}_${meta.library_id}_${meta.reference}_breadth" }
publishDir = [
path: { "${params.outdir}/mapstats/bedtools" },
mode: params.publish_dir_mode
]
}


//
// DAMAGE MANIPULATION
//
Expand Down Expand Up @@ -704,14 +759,14 @@ process {
// CONTAMINATION ESTIMATION
//
withName: ANGSD_DOCOUNTS {
ext.prefix = { "${meta.sample_id}_${meta.library_id}_${meta.reference}" }
tag = { "${meta.reference}|${meta.sample_id}_${meta.library_id}" }
ext.args = [
"-iCounts 1",
"-r ${params.contamination_estimation_angsd_chrom_name}:${params.contamination_estimation_angsd_range_from}-${params.contamination_estimation_angsd_range_to}",
"-minMapQ ${params.contamination_estimation_angsd_mapq}",
"-minQ ${params.contamination_estimation_angsd_minq}"
].join(' ').trim()
ext.prefix = { "${meta.sample_id}_${meta.library_id}_${meta.reference}" }
publishDir = [
enabled: false
]
Expand Down
4 changes: 4 additions & 0 deletions conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,10 @@ params {
bamfiltering_minreadlength = 30
bamfiltering_mappingquality = 37

// Map Stats
run_bedtools_coverage = true
mapstats_bedtools_featurefile = 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/reference/Mammoth/Mammoth_MT_Krause.gff3'

// Metagenomic screening
run_metagenomicscreening = false

Expand Down
2 changes: 0 additions & 2 deletions conf/test_full.config
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,6 @@
----------------------------------------------------------------------------------------
*/

cleanup = true

params {
config_profile_name = 'Full test profile'
config_profile_description = 'Full test dataset to check pipeline function'
Expand Down
6 changes: 5 additions & 1 deletion conf/test_humanbam.config
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,11 @@ params {
input = 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/testdata/Human/human_design_bam_eager3.tsv'

// Genome references
fasta = 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/reference/Human/hs37d5_chr21.fasta'
fasta = 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/reference/Human/hs37d5_chr21-MT.fa.gz'

// Contamination estimation
contamination_estimation_angsd_mapq = 0
contamination_estimation_angsd_minq = 0

// TODO Reactivate sexDet and genotyping params when those steps get implemented.
// //Sex Determination
Expand Down
Loading

0 comments on commit 29a3b3a

Please sign in to comment.