aDNA LD Public

The public facing repository for looking at two-locus and haplotype copying properties in models with temporal sampling.

Figure Directory

If you are interested in a particular figure from the manuscript, the relevant iPython notebook is listed below:

Main Figures

Supplementary Figures

Data

The data here represents intermediate data sources to generate the CSV files in results. These are typically in the form of tables that represent genetic map coordinates or sample names.

Results

The results directory houses all of the files that are necessary to recreate the plots in both the main text and the supplementary materials. They represent the final output of snakemake rules that perform either simulations or estimate parameters from the data. If you are interested in the raw data used to generate the plots, this is where you want to take a look.

Snakemake to recreate results

The files in the snakefiles directory are not directly used in this setting, but can be used in conjunction with snakemake to rerun the entire analysis and replicate our simulation results fully.

To re-run the full analysis (not using the pre-generated results): you can run:

snakemake -s main.smk all_sim_results -j <number of cores>

Note that you will also want to change the tmpdir parameter in the config.yml file so that you have a place where you can write XXX Gb of data. Be warned that re-running all of the simulation analyses takes ~4-5 hours on a computing cluster with 200 parallel jobs (so is likely to take longer on a single laptop).

For our results on real ancient data, we have not chosen to store the data within this repository as it breaks some file-size limits on github, but have provided a fast snakemake rule to download the data from Dropbox and unpack it (~ 6 GB of data):

snakemake -s main.smk download_data -j <number of cores>

If you are interested in re-creating the results CSV files with the newly downloaded ancient male X-chromosome data:

snakemake -s main.smk infer_jump_rates_real_data_all -j <number of cores>

This recreation of the haplotype-copying inference data will also generally take quite some time (~10 hours on a computing cluster with 200 parallel jobs).

Source Code

The src directory contains implementations of:

Coalescent simulations with serial sampling using msprime (including two-locus simulations)
A python-based implementation of the Li-Stephens model (using numba)
Theoretical formulas for the correlation in tree-length and tree height across two loci in scenarios with serial sampling (coal_cov.py)

Acknowledgements

Matthias Steinrücken
John Novembre
Novembre, Steinrucken, Berg Labs @ UChicago
NIH GRTG

Name		Name	Last commit message	Last commit date
Latest commit History 147 Commits
config		config
data		data
docs		docs
notebooks		notebooks
results		results
snakefiles		snakefiles
src		src
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE.md		LICENSE.md
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
config.yml		config.yml
environment.yml		environment.yml
main.smk		main.smk
requirements.txt		requirements.txt
run_snakemake.sh		run_snakemake.sh
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

aDNA LD Public

Figure Directory

Data

Results

Snakemake to recreate results

Source Code

Acknowledgements

About

Releases 2

Packages

Languages

License

aabiddanda/aDNA_LD_public

Folders and files

Latest commit

History

Repository files navigation

aDNA LD Public

Figure Directory

Data

Results

Snakemake to recreate results

Source Code

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages