-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
115 changed files
with
7,783 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
# Auto detect text files and perform LF normalization | ||
* text=auto | ||
|
||
# Remove the tutorial jupyter notebook from the language calculation of github | ||
tutorial.ipynb linguist-vendored |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
name: test | ||
|
||
on: | ||
push: | ||
branches: [ main ] | ||
pull_request: | ||
branches: [ main ] | ||
|
||
jobs: | ||
test: | ||
name: test ${{ matrix.py }} on ${{ matrix.os }} | ||
runs-on: ${{ matrix.os }} | ||
strategy: | ||
matrix: | ||
py: | ||
- "3.9" | ||
- "3.10" | ||
- "3.11" | ||
os: | ||
- ubuntu-latest | ||
- windows-latest | ||
- macos-latest | ||
steps: | ||
- name: Checkout sources | ||
uses: actions/checkout@v4 | ||
|
||
- name: Setup Python | ||
uses: actions/setup-python@v4 | ||
with: | ||
python-version: ${{ matrix.py }} | ||
|
||
- name: Install dependencies | ||
run: | | ||
python -m pip install --upgrade pip | ||
python -m pip install tox tox-gh-actions | ||
- name: Run test suite | ||
run: tox |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
.ipynb_checkpoints | ||
**/__pycache__/ | ||
|
||
dist/ | ||
|
||
.tox/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
# See https://pre-commit.com for more information | ||
repos: | ||
- repo: https://github.com/pre-commit/pre-commit-hooks | ||
rev: v4.0.1 | ||
hooks: | ||
- id: check-toml | ||
- id: check-yaml | ||
- id: end-of-file-fixer | ||
- id: mixed-line-ending | ||
- repo: https://github.com/python-poetry/poetry | ||
rev: 1.6.1 | ||
hooks: | ||
- id: poetry-check | ||
- id: poetry-lock | ||
- repo: https://github.com/psf/black | ||
rev: 23.9.0 | ||
hooks: | ||
- id: black | ||
- repo: https://github.com/PyCQA/isort | ||
rev: 5.12.0 | ||
hooks: | ||
- id: isort | ||
args: ["--profile", "black"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
#### Changelog | ||
|
||
All noteable changes to this project will be documented in this file. | ||
|
||
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), | ||
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). | ||
|
||
## [Unreleased] | ||
|
||
--- | ||
--- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,70 @@ | ||
# salamander | ||
Salamander is a non-negative matrix factorization framework for signature analysis | ||
# Salamander | ||
|
||
[![Python versions supported][python-image]][python-url] | ||
[![License][license-image]][license-url] | ||
[![Code style][style-image]][style-url] | ||
|
||
[python-image]: https://img.shields.io/badge/python-3.9%20|%203.10%20|%203.11-blue.svg | ||
[python-url]: https://github.com/BeGeiger/CorrNMF | ||
[license-image]: https://img.shields.io/badge/License-MIT-yellow.svg | ||
[license-url]: https://github.com/BeGeiger/CorrNMF/blob/main/LICENSE | ||
[style-image]: https://img.shields.io/badge/code%20style-black-000000.svg | ||
[style-url]: https://github.com/psf/black | ||
|
||
Salamander is a non-negative matrix factorization (NMF) framework for signature analysis. | ||
It implements multiple NMF algorithms, common visualizations, and can be easily customized & expanded. | ||
|
||
--- | ||
|
||
## Installation | ||
|
||
PyPI: | ||
```bash | ||
pip install salamander-learn | ||
``` | ||
|
||
## Usage | ||
|
||
The following example illustrates the basic syntax: | ||
|
||
```python | ||
import pandas as pd | ||
import salamander-learn as sal | ||
|
||
# samples and features have to be named appropriately | ||
data_path = "..." | ||
data = pd.read_csv(data_path, index_col=0) | ||
|
||
# NMF with a Poisson noise model | ||
model = sal.KLNMF(n_signatures=5) | ||
model.fit(data) | ||
|
||
# barplot | ||
model.plot_signatures() | ||
|
||
# stacked barplot | ||
model.plot_exposures() | ||
|
||
# signature correlation | ||
model.plot_correlation() | ||
|
||
# sample_correlation | ||
model.plot_correlation(data="samples") | ||
|
||
# dimensionality reduction of the exposures | ||
# method: umap, pca or tsne | ||
model.plot_embeddings(method="umap") | ||
``` | ||
|
||
For examples of how to customize any NMF algorithm and the plots, check out [the tutorial](). The following algorithms are currently available: | ||
* [NMF with KL-divergence loss](https://proceedings.neurips.cc/paper_files/paper/2000/file/f9d1152547c0bde01830b7e8bd60024c-Paper.pdf) | ||
* [minimum-volume NMF](https://browse.arxiv.org/pdf/1907.02404.pdf) | ||
* [a variant of correlated NMF](https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=87224164eef14589b137547a3fa81f06eef9bbf4) | ||
|
||
## License | ||
|
||
MIT | ||
|
||
## Changelog | ||
|
||
Consult the [CHANGELOG](https://github.com/BeGeiger/CorrNMF/blob/main/CHANGELOG.md) file for enhancements and fixes of each version. |
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
[tool.poetry] | ||
name = "salamander-learn" | ||
version = "0.1.1" | ||
description = "Salamander is a non-negative matrix factorization framework for signature analysis" | ||
license = "MIT" | ||
authors = ["Benedikt Geiger"] | ||
maintainers = [ | ||
"Benedikt Geiger <[email protected]>", | ||
] | ||
packages = [{ include = "salamander", from = "src" }] | ||
|
||
|
||
readme = "README.md" | ||
|
||
[tool.poetry.dependencies] | ||
python = ">=3.9,<3.12" | ||
fastcluster = "^1.2.6" | ||
matplotlib = "^3.7.1" | ||
numba = "^0.57" | ||
numpy = "^1.24.3" | ||
pandas = "^1.5.3" | ||
scikit-learn = "^1.3.0" | ||
scipy = "^1.10.1" | ||
seaborn = "^0.13.0" | ||
umap-learn = "^0.5.4" | ||
|
||
[tool.poetry.group.dev.dependencies] | ||
pytest = "^7.4.2" | ||
pre-commit = "^3.4.0" | ||
tox = "^4.11.3" | ||
|
||
[tool.pytest.ini_options] | ||
# /site-packages/umap/__init__.py:36: DeprecationWarning: pkg_resources is deprecated as an API. | ||
filterwarnings = [ | ||
"ignore::DeprecationWarning:umap.*:", | ||
] | ||
|
||
[build-system] | ||
requires = ["poetry-core"] | ||
build-backend = "poetry.core.masonry.api" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
""" | ||
Salamander: a non-negative matrix factorization framework for signature analysis | ||
================================================================================ | ||
""" | ||
from .nmf_framework.corrnmf_det import CorrNMFDet | ||
from .nmf_framework.klnmf import KLNMF | ||
from .nmf_framework.multimodal_corrnmf import MultimodalCorrNMF | ||
from .nmf_framework.mvnmf import MvNMF | ||
|
||
__version__ = "0.1.0" | ||
__all__ = ["CorrNMFDet", "KLNMF", "MvNMF", "MultimodalCorrNMF"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,88 @@ | ||
NUCLEOTIDES = ["A", "C", "G", "T"] | ||
|
||
SBS_TYPES_6 = ["C>A", "C>G", "C>T", "T>A", "T>C", "T>G"] | ||
SBS_TYPES_96 = [ | ||
f"{n1}[{sbs_6}]{n2}" | ||
for sbs_6 in SBS_TYPES_6 | ||
for n1 in NUCLEOTIDES | ||
for n2 in NUCLEOTIDES | ||
] | ||
|
||
# fmt: off | ||
INDEL_TYPES_83 = [ | ||
"DEL.C.1.1", "DEL.C.1.2", 'DEL.C.1.3', "DEL.C.1.4", "DEL.C.1.5", "DEL.C.1.6+", | ||
"DEL.T.1.1", "DEL.T.1.2", 'DEL.T.1.3', "DEL.T.1.4", "DEL.T.1.5", "DEL.T.1.6+", | ||
"INS.C.1.0", "INS.C.1.1", 'INS.C.1.2', "INS.C.1.3", "INS.C.1.4", "INS.C.1.5+", | ||
"INS.T.1.0", "INS.T.1.1", 'INS.T.1.2', "INS.T.1.3", "INS.T.1.4", "INS.T.1.5+", | ||
"DEL.repeats.2.1", "DEL.repeats.2.2", "DEL.repeats.2.3", | ||
"DEL.repeats.2.4", "DEL.repeats.2.5", "DEL.repeats.2.6+", | ||
"DEL.repeats.3.1", "DEL.repeats.3.2", "DEL.repeats.3.3", | ||
"DEL.repeats.3.4", "DEL.repeats.3.5", "DEL.repeats.3.6+", | ||
"DEL.repeats.4.1", "DEL.repeats.4.2", "DEL.repeats.4.3", | ||
"DEL.repeats.4.4", "DEL.repeats.4.5", "DEL.repeats.4.6+", | ||
"DEL.repeats.5+.1", "DEL.repeats.5+.2", "DEL.repeats.5+.3", | ||
"DEL.repeats.5+.4", "DEL.repeats.5+.5", "DEL.repeats.5+.6+", | ||
"INS.repeats.2.0", "INS.repeats.2.1", "INS.repeats.2.2", | ||
"INS.repeats.2.3", "INS.repeats.2.4", "INS.repeats.2.5+", | ||
"INS.repeats.3.0", "INS.repeats.3.1", "INS.repeats.3.2", | ||
"INS.repeats.3.3", "INS.repeats.3.4", "INS.repeats.3.5+", | ||
"INS.repeats.4.0", "INS.repeats.4.1", "INS.repeats.4.2", | ||
"INS.repeats.4.3", "INS.repeats.4.4", "INS.repeats.4.5+", | ||
"INS.repeats.5+.0", "INS.repeats.5+.1", "INS.repeats.5+.2", | ||
"INS.repeats.5+.3", "INS.repeats.5+.4", "INS.repeats.5+.5+", | ||
"DEL.MH.2.1", | ||
"DEL.MH.3.1", "DEL.MH.3.2", | ||
"DEL.MH.4.1", "DEL.MH.4.2", "DEL.MH.4.3", | ||
"DEL.MH.5+.1", "DEL.MH.5+.2", "DEL.MH.5+.3", "DEL.MH.5+.4", "DEL.MH.5+.5+" | ||
] | ||
# fmt: on | ||
|
||
# 10 colors | ||
COLORS_MATHEMATICA = [ | ||
(0.368417, 0.506779, 0.709798), | ||
(0.880722, 0.611041, 0.142051), | ||
(0.560181, 0.691569, 0.194885), | ||
(0.922526, 0.385626, 0.209179), | ||
(0.528288, 0.470624, 0.701351), | ||
(0.772079, 0.431554, 0.102387), | ||
(0.363898, 0.618501, 0.782349), | ||
(1.0, 0.75, 0.0), | ||
(0.280264, 0.715, 0.429209), | ||
(0.0, 0.0, 0.0), | ||
] | ||
|
||
# Trinucleotide colors for the 96 dimensional mutation spectrum | ||
COLORS_TRINUCLEOTIDES = [ | ||
(0.33, 0.75, 0.98), | ||
(0.0, 0.0, 0.0), | ||
(0.85, 0.25, 0.22), | ||
(0.78, 0.78, 0.78), | ||
(0.51, 0.79, 0.24), | ||
(0.89, 0.67, 0.72), | ||
] | ||
|
||
COLORS_SBS96 = [COLORS_TRINUCLEOTIDES[i // 16] for i in range(96)] | ||
|
||
COLORS_INDEL = [ | ||
"#FCBD6F", # 1bp Del C | ||
"#FD8001", # 1bp Del T | ||
"#B0DC8B", # 1bp Ins C | ||
"#35A02E", # 1bp Ins T | ||
"#FCC9B4", # 2bp Del Repeats | ||
"#FC896B", # 3bp Del Repeats | ||
"#F04432", # 4bp Del Repeats | ||
"#BC1A1A", # 5+ bp Del Repeats | ||
"#CFE0F0", # 2bp Ins Repeats | ||
"#94C3DF", # 3bp Ins Repeats | ||
"#4A98C8", # 4bp Ins Repeats | ||
"#1665AA", # 5+ bp Ins Repeats | ||
"#E1E0ED", # 2bp Del MH | ||
"#B5B5D8", # 3bp Del MH | ||
"#8683BC", # 4bp Del MH | ||
"#624099", # 5+bp Del MH | ||
] | ||
|
||
# 12 * 6 + 11 = 83 colors | ||
n_times = 12 * [6] + [1, 2, 3, 5] | ||
COLORS_INDEL83 = [n * [col] for n, col in zip(n_times, COLORS_INDEL)] | ||
COLORS_INDEL83 = [col for color_list in COLORS_INDEL83 for col in color_list] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
"" |
Oops, something went wrong.