Skip to content

Latest commit

 

History

History
57 lines (34 loc) · 5.35 KB

README.md

File metadata and controls

57 lines (34 loc) · 5.35 KB

Welcome to LookingGlass

LookingGlass is a general-purpose 'universal language of life' deep learning model for read-length biological sequences. It can be used for diverse downstream transfer learning tasks for biological data, some of which are described in the paper.

This is the main repository for these pretrained models. Static URLs for downloading these models are available in release v1 of this repo. See more detailed descriptions of the models contained in this repository below.

A python package - fastBio - for loading, manipulating, and training biological data for deep learning, as well as using the pretrained models in this release, is also available.

If you find LookingGlass, LookingGlass-derived models, or fastBio helpful, please cite the paper:

Hoarfrost, A., Aptekmann, A., Farfañuk, G. et al. Deep learning of a bacterial and archaeal universal language of life enables transfer learning and illuminates microbial dark matter. Nat Commun 13, 2606 (2022). https://doi.org/10.1038/s41467-022-30070-8.

Models in the most recent release

The most recent release of LookingGlass provides static URLs for available pretrained models.

  • LookingGlass

    LookingGlass is a 'universal language of life', producing contextually-aware, functionally and evolutionarily relevant representations of short DNA reads. As a general purpose 'biological language' representation model, it is broadly useful for training diverse downstream transfer learning tasks. A number of exports are available:

  • Functional Classifier

    The functional classifier can classify DNA reads into one of 1247 experimentally-validated functional annotations with 81.5% accuracy. The following exports are available:

  • Oxidoreductase Classifier

    The oxidoreductase can classify whether a DNA read originates from an oxidoreductase (EC number 1.-.-.-) with 82.3% accuracy. The following exports are available:

  • Optimal Temp Classifier

    The optimal temperature classifier can identify whether a DNA read originates from an enzyme with a psychrophilic (<15 C), mesophilic (20-40 C), or thermophilic (>50 C) optimal temperature with 70.1% accuracy. The following exports are available:

  • Reading Frame Classifier

    The reading frame classifier identifies the correct reading frame start position (1, 2, 3, -1, -2, or -3) for DNA reads (and thus the true amino acid sequence). Note it is currently only intended for prokaryotic sequences with low proportions of noncoding DNA. The following exports are available:

LookingGlass vocabulary

The vocabulary used for training LookingGlass and all LookingGlass-derived models above is also available for direct download:

  • ngs_vocab_k1_withspecial.npy - maintains the vocabulary token order for correct numericalization of DNA sequences for LookingGlass-derived models.

Tutorial

The fastBio python package can be used to directly create LookingGlass and LookingGlass-derived models (with or without pretrained weights). See the fastBio tutorial.

Alternatively, models described above were saved with pytorch.save() and can be loaded directly into your own scripts with pytorch.load().