🛑 This repository was moved to https://github.com/pluskal-lab/DreaMS. 🛑
Source code for the paper "Emergence of molecular structures from repository-scale self-supervised learning on tandem mass spectra".
This GitHub repository is a work in progress. We are planning to transform it into a user-friendly Python package by the end of May 2024.
Run the following code from the command line.
git clone https://github.com/roman-bushuiev/DreaMS.git; cd DreaMS
. scripts/install.sh
. scripts/download_models.sh
git clone
command will download this GitHub repository and install.sh
will install it. download_models.sh
script will download pre-trained DreaMS models from Zenodo. The installation script will create a conda environment named dreams
. If you are not familiar with conda or do not have it installed, please refer to the official documentation.
To compute DreaMS representations for MS/MS spectra from .mgf
file, run the following Python code.
from dreams.api import compute_dreams_embeddings
dreams_embeddings = compute_dreams_embeddings('data/examples/example_5_spectra.mgf')
The resulting dreams_embeddings
object is a matrix with 5 rows and 1024 columns, representing 5 1024-dimensional DreaMS representations for 5 input spectra stored in the .mgf
file.
- Wrap the repository into a pip package.
- Import utilities to matchms.
- DreaMS Atlas exploration demo.
- Upload weights of all models.
- Provide scripts to collect/download GeMS datasets.
- Extend
dreams.api
with more functionality (e.g. attention heads analysis). - Add tutorial notebooks.
- Upload Murcko splits and detailed tutorial notebook.