Skip to content

Reproducible reanalysis of a combined ChIP-Seq & RNA-Seq data set

Notifications You must be signed in to change notification settings

DarwinAwardWinner/CD4-csaw

Repository files navigation

Re-analysis of a combined ChIP-Seq & RNA-Seq data set

This is the code for a re-analysis of a GEO dataset that I originally analyzed for this paper using statistical methods that were not yet available at the time, such as the csaw Bioconductor package, which provides a principled way to normalize windowed counts of ChIP-Seq reads and test them for differential binding. The original paper only analyzed binding within pre-defined promoter regions. In addition, some improvements have also been made to the RNA-seq analysis using newer features of limma such as quality weights.

This workflow downloads the sequence data and sample metadata from the public GEO/SRA release, so anyone can download and run this code to reproduce the full analysis.

Workflow

Rule Graph

Completed components

  • ChIP-seq
    • Mapping with bowtie2
    • Peak calling with MACS2 and Epic
    • Fetching of blacklists from UCSC
    • Generation of greylists from ChIP-Seq input samples
    • IDR analysis of blacklist-filtered peak calls
    • Computation of cross-correlation function for ChIP-Seq samples, excluding blacklisted regions
    • Counting in windows across the genome
  • RNA-seq
    • Mapping with STAR & HISAT2
    • Counting reads aligned to genes
    • Alignment-free bias-corrected transcript quantification using Salmon & Kallisto
    • Differential gene expression

Possible TODO components

TODO Code cleanup

  • Remove unnecessary library() calls
  • Put spaces around equals signs

TODO Other

  • Document how to run the pipeline
  • Provide install script for R & Python packages.

Dependencies

Command-line tools

Programming languages and packages

  • R, Bioconductor, and the following R packages:
    • From CRAN: assertthat, doParallel, dplyr, future, getopt, GGally, ggforce, ggfortify, ggplot2, ks, lazyeval, lubridate, magrittr, MASS, Matrix, openxlsx, optparse, parallel, purrr, RColorBrewer, readr, reshape2, rex, scales, stringi, stringr
    • From Bioconductor: annotate, Biobase, BiocParallel, BSgenome.Hsapiens.UCSC.hg19, BSgenome.Hsapiens.UCSC.hg38, ChIPQC, csaw, edgeR, GenomicFeatures, GenomicRanges, GEOquery, limma, org.Hs.eg.db, Rsamtools, Rsubread, rtracklayer, S4Vectors, SRAdb, SummarizedExperiment, TxDb.Hsapiens.UCSC.hg19.knownGene, tximport
    • Installed manually: sleuth, wasabi
  • Python 3 and the following Python packages: biopython, atomicwrites, numpy, pandas, plac, pysam, rpy2, snakemake

About

Reproducible reanalysis of a combined ChIP-Seq & RNA-Seq data set

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages