Skip to content

sankarmaheshr/Machine-learning-for-proteins

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Papers on machine learning for proteins

Background

We recently released a review of machine learning methods in protein engineering, but the field changes so fast and there are so many new papers that any static document will inevitably be missing important work. This format also allows us to broaden the scope beyond engineering-specific applications. We hope that this will be a useful resource for people interested in the field.

To the best of our knowledge, this is the first public, collaborative list of machine learning papers on protein applications. We try to classify papers based on a combination of their applications and model type. If you have suggestions for other papers or categories, please make a pull request or issue!

Format

Within each category, papers are listed in reverse chronological order (newest first). Where possible, a link should be provided.

Categories

Reviews
Tools and datasets
Machine-learning guided directed evolution
Representation learning
Unsupervised variant prediction
Generative models
Biophysics
Predicting stability
Predicting structure from sequence
Predicting sequence from structure
Classification, annotation, search, and alignments
Predicting interactions with other molecules
Other supervised learning

Reviews

Harnessing Generative AI to Decode Enzyme Catalysis and Evolution for Enhanced Engineering.
Wen Jun Xie, Arieh Warshel.
Preprint, October 2023.
[10.1101/2023.10.10.561808]

Machine Learning-Guided Protein Engineering.
Petr Kouba, Pavel Kohout, Faraneh Haddadi, Anton Bushuiev, Raman Samusevich, Jiri Sedlar, Jiri Damborsky, Tomas Pluskal, Josef Sivic, and Stanislav Mazurenko.
ACS Catalysis, October 2023.
[10.1021/acscatal.3c02743]

Generative artificial intelligence for de novo protein design.
Adam Winnifrith, Carlos Outeiral, Brian Hie.
Preprint, October 2023.
[arxiv]

Growing ecosystem of deep learning methods for modeling protein.
Julia R. Rogers, Gergő Nikolényi, Mohammed AlQuraishi.
Preprint, October 2023.
[arxiv]

Exploring the Protein Sequence Space with Global Generative Models.
Sergio Romero-Romero, Sebastian Lindner, Noelia Ferruz.
Preprint, May 2023.
[arxiv]

Diffusion Models in Bioinformatics: A New Wave of Deep Learning Revolution in Action.
Zhiye Guo, Jian Liu, Yanli Wang, Mengrui Chen, Duolin Wang, Dong Xu, Jianlin Cheng.
Preprint, February 2023.
[arxiv]

From sequence to function through structure: deep learning for protein design.
Noelia Ferruz, Michael Heinzinger, Mehmet Akdel, Alexander Goncearenco, Luca Naef, Christian Dallago.
Preprint, September 2022.
[10.1101/2022.08.31.505981]

Computational protein design with evolutionary-based and physics-inspired modeling: current and future synergies.
Cyril Malbranke, David Bikard, Simona Cocco, Rémi Monasson, Jérôme Tubiana.
Preprint, August 2022.
[arxiv]

Deep learning approaches for conformational flexibility and switching properties in protein design.
Lucas S. P. Rudde, Mahdi Hijazi, Patrick Barth.
Front. Mol. Biosci., August 2022.
[10.3389/fmolb.2022.928534]

Controllable protein design with language models.
Noelia Ferruz, Birte Höker.
Nature Machine Intelligence, June 2022.
[10.1038/s42256-022-00499-z]

The road to fully programmable protein catalysis.
Sarah L. Lovelock, Rebecca Crawshaw, Sophie Basler, Colin Levy, David Baker, Donald Hilvert, Anthony P. Green.
Nature, June 2022.
[10.1038/s41586-022-04456-z]

Efficient Exploration of Sequence Space by Sequence-Guided Protein Engineering and Design.
Ben E. Clifton, Dan Kozome, and Paola Laurino.
Biochemistry, March 2022.
[10.1021/acs.biochem.1c00757]

Learning functional properties of proteins with language models.
Serbulent Unsal, Heval Atas, Muammer Albayrak, Kemal Turhan, Aybar C. Acar & Tunca Doğan.
Nature Machine Intelligence, March 2022.
[10.1038/s42256-022-00457-9]

Applications of artificial intelligence to enzyme and pathway design for metabolic engineering.
Woo Dae Jang, Gi Bae Kim, Yeji Kim, Sang Yup Lee.
Current Opinion in Biotechnology, February 2022.
[10.1016/j.copbio.2021.07.024]

Adaptive machine learning for protein engineering.
Brian L. Hie, Kevin K. Yang.
Current Opinion in Structural Biology, February 2022.
[10.1016/j.sbi.2021.11.002]

Protein sequence design with deep generative models.
Zachary Wu, Kadina E. Johnston, Frances H. Arnold, Kevin K. Yang.
Current Opinion in Chemical Biology, December 2021.
[10.1016/j.cbpa.2021.04.004]

AI challenges for predicting the impact of mutations on protein stability.
Fabrizio Pucci, Martin Schwersensky, Marianne Rooman.
Preprint, November 2021.
[arxiv]

Advances in machine learning for directed evolution. Bruce J Wittmann, Kadina E Johnston, Zachary Wu, Frances H Arnold.
Current Opinion in Structural Biology, August 2021.
10.1016/j.sbi.2021.01.008]

A Brief Review of Machine Learning Techniques for Protein Phosphorylation Sites Prediction.
Farzaneh Esmaili, Mahdi Pourmirzaei, Shahin Ramazi, Elham Yavari. Preprint, August 2021.
[arxiv]

Learning the protein language: Evolution, structure, and function.
Tristan Bepler, Bonnie Berger.
Cell Systems, June 2021.
[10.1016/j.cels.2021.05.017]

Representation learning applications in biological sequence analysis.
Hitoshi Iuchi, Taro Matsutani, Keisuke Yamada, Natsuki Iwano, Shunsuke Sumi, Shion Hosoda, Shitao Zhao, Tsukasa Fukunaga, Michiaki Hamada.
Computational and Structural Biotechnology Journal, May 2021.
[10.1016/j.csbj.2021.05.039]

Data-driven computational protein design.
Vincent Frappier, Amy E. Keating.
Current Opinion in Structural Biology, May 2021.
/10.1016/j.sbi.2021.03.009]

Machine learning in protein structure prediction.
Mohammed AlQuraishi.
Current Opinion in Chemical Biology, May 2021.
[10.1016/j.cbpa.2021.04.005]

Protein sequence-to-structure learning: Is this the end(-to-end revolution)?.
Elodie Laine, Stephan Eismann, Arne Elofsson, Sergei Grudinin.
Preprint, May 2021.
[arxiv]

Revolutionizing enzyme engineering through artificial intelligence and machine learning.
Nitu Singh, Sunny Malik, Anvita Gupta, Kinshuk Raj Srivastava.
Emerging topics in life sciences, April 2021.
[10.1042/ETLS20200257]

The language of proteins: NLP, machine learning & protein sequences.
Dan Ofer, Nadav Brandes, Michal Linial.
Computational and Structural Biotechnology Journal, January 2021.
[10.1016/j.csbj.2021.03.022]

Machine learning and AI-based approaches for bioactive ligand discovery and GPCR-ligand recognition.
Sebastian Raschka, Benjamin Kaufman.
Preprint, January 2020.
[arXiv]

Machine Learning in Enzyme Engineering.
Stanislav Mazurenko, Zbynek Prokop, Jiri Damborsky.
ACS Catalysis, December 2019.
[10.1021/acscatal.9b04321]

Machine learning-guided directed evolution for protein engineering.
Kevin K. Yang, Zachary Wu, Frances H. Arnold.
Nature Methods, July 2019.
[10.1038/s41592-019-0496-6]
Preprint available on arxiv.

Evaluating Protein Transfer Learning with TAPE.
Roshan Rao, Nicholas Bhattacharya, Neil Thomas, Yan Duan, Xi Chen, John Canny, Pieter Abbeel, Yun S. Song.
Preprint, June 2019.
[arxiv]

Can Machine Learning Revolutionize Directed Evolution of Selective Enzymes?
Guangyue Li, Yijie Dong, Manfred T. Reetz.
Advanced Synthesis & Catalysis, March 2019.
[10.1002/adsc.201900149]

Tools and datasets

Deep indel mutagenesis reveals the impact of insertions and deletions on protein stability and function.
Magdalena Topolska, Antoni Beltran, Ben Lehner.
Preprint, October 2023.
[10.1101/2023.10.06.561180]

OpenProteinSet: Training data for structural biology at scale.
Gustaf Ahdritz, Nazim Bouatta, Sachin Kadyan, Lukas Jarosch, Daniel Berenberg, Ian Fisk, Andrew M. Watkins, Stephen Ra, Richard Bonneau, Mohammed AlQuraishi.
Preprint, August 2023.
[arxiv]

Mega-scale experimental analysis of protein folding stability in biology and design.
Kotaro Tsuboyama, Justas Dauparas, Jonathan Chen, Elodie Laine, Yasser Mohseni Behbahani, Jonathan J. Weinstein, Niall M. Mangan, Sergey Ovchinnikov & Gabriel J. Rocklin.
Nature, July 2023.
[10.1038/s41586-023-06328-6]

FLOP: Tasks for Fitness Landscapes Of Protein wildtypes.
Peter Mørch Groth, Richard Michael, Jesper Salomon, Pengfei Tian, Wouter Boomsma.
Preprint, June 2023.
[10.1101/2023.06.21.545880]

Computational Scoring and Experimental Evaluation of Enzymes Generated by Neural Networks.
Sean R. Johnson, Xiaozhi Fu, Sandra Viknander, Clara Goldin, Sarah Monaco, Aleksej Zelezniak, Kevin K. Yang.
Preprint, April 2023.
[10.1101/2023.03.04.531015]

PDBench: Evaluating Computational Methods for Protein-Sequence Design.
Leonardo V Castorina, Rokas Petrenas, Kartic Subr, Christopher W Wood.
Bioinformatics, January 2023.
[10.1093/bioinformatics/btad027]]

The energetic and allosteric landscape for KRAS inhibition.
Chenchun Weng, Andre J. Faure, Ben Lehner.
Preprint, December 2022.
[10.1101/2022.12.06.519122]

ManyFold: an efficient and flexible library for training and validating protein folding models.
Amelia Villegas-Morcillo, Louis Robinson, Arthur Flajolet, Thomas D Barrett.
Bioinformatics, December 2022.
[10.1093/bioinformatics/btac773]

Mega-scale experimental analysis of protein folding stability in biology and protein design.
Kotaro Tsuboyama, Justas Dauparas, Jonathan Chen, Elodie Laine, Yasser Mohseni Behbahani, Jonathan J. Weinstein, Niall M. Mangan, Sergey Ovchinnikov, Gabriel J. Rocklin.
Preprint, December 2022.
[10.1101/2022.12.06.519132]

Tuned Fitness Landscapes for Benchmarking Model-Guided Protein Design.
Neil Thomas, Atish Agarwala, David Belanger, Yun S. Song, Lucy J. Colwell.
Preprint, October 2022.
[10.1101/2022.10.28.514293]

Deep mutational scanning and machine learning reveal structural and molecular rules governing allosteric hotspots in homologous proteins.
Megan Leander, Zhuang Liu, Qiang Cui, Srivatsan Raman.
Elife, October 2022.
[10.7554/eLife.79932]

Randomized gates eliminate bias in sort-seq assays.
Brian L. Trippe, Buwei Huang, Erika A. DeBenedictis, Brian Coventry, Nicholas Bhattacharya, Kevin K. Yang, David Baker, Lorin Crawford.
Protein Science, August 2022.
[10.1002/pro.4401]

Uni-Fold: An Open-Source Platform for Developing Protein Folding Models beyond AlphaFold.
Ziyao Li, Xuyang Liu, Weijie Chen, Fan Shen, Hangrui Bi, Guolin Ke, Linfeng Zhang.
Preprint, August 2022.
[10.1101/2022.08.04.502811]

PEER: A Comprehensive and Multi-Task Benchmark for Protein Sequence Understanding.
Minghao Xu, Zuobai Zhang, Jiarui Lu, Zhaocheng Zhu, Yangtian Zhang, Chang Ma, Runcheng Liu, Jian Tang.
Preprint, June 2022.
[arxiv]

FLIP: Benchmark tasks in fitness landscape inference for proteins.
Christian Dallago, Jody Mou, Kadina E. Johnston, Bruce J. Wittmann, Nicholas Bhattacharya, Samuel Goldman, Ali Madani, Kevin K. Yang.
NeurIPS 2021 Datasets and Benchmarks Track, December 2021.
[10.1101/2021.11.09.467890]

evSeq: Cost-Effective Amplicon Sequencing of Every Variant in a Protein Library.
Bruce J. Wittmann, Kadina E. Johnston, Patrick J. Almhjell, Frances H. Arnold.
Preprint, November 2021.
[10.1101/2021.11.18.469179]

The immuneML ecosystem for machine learning analysis of adaptive immune receptor repertoires.
Milena Pavlović, Lonneke Scheffer, Keshav Motwani, Chakravarthi Kanduri, Radmila Kompova, Nikolay Vazov, Knut Waagan, Fabian L. M. Bernal, Alexandre Almeida Costa, Brian Corrie, Rahmad Akbar, Ghadi S. Al Hajj, Gabriel Balaban, Todd M. Brusko, Maria Chernigovskaya, Scott Christley, Lindsay G. Cowell, Robert Frank, Ivar Grytten, Sveinung Gundersen, Ingrid Hobæk Haff, Eivind Hovig, Ping-Han Hsieh, Günter Klambauer, Marieke L. Kuijjer, Christin Lund-Andersen, Antonio Martini, Thomas Minotto, Johan Pensar, Knut Rand, Enrico Riccardi, Philippe A. Robert, Artur Rocha, Andrei Slabodkin, Igor Snapkov, Ludvig M. Sollid, Dmytro Titov, Cédric R. Weber, Michael Widrich, Gur Yaari, Victor Greiff & Geir Kjetil Sandve.
Nature Machine Intelligence, November 2021.
[10.1038/s42256-021-00413-z]

Learned embeddings from deep learning to visualize and predict protein sets.
Christian Dallago, Konstantin Schütze, Michael Heinzinger, Tobias Olenyi, Maria Littmann, Amy X Lu, Kevin K Yang, Seonwoo Min, Sungroh Yoon, James T Morton, Burkhard Rost.
Current Protocols, May 2021.
[10.1002/cpz1.113]

Population-Based Black-Box Optimization for Biological Sequence Design.
Christof Angermueller, David Belanger, Andreea Gane, Zelda Mariet, David Dohan, Kevin Murphy, Lucy Colwell, D Sculley.
ICML, July 2020.
[ICML]

Selene: a PyTorch-based deep learning library for sequence data.
Kathleen M. Chen, Evan M. Cofer, Jian Zhou, Olga G. Troyanskaya.
Nature Methods, March 2019.
[10.1038/s41592-019-0360-8]

Machine-learning guided directed evolution

Improving protein expression, stability, and function with ProteinMPNN.
Kiera H. Sumida, Reyes Núñez-Franco, Indrek Kalvet, Samuel J. Pellock, Basile I. M. Wicky, Lukas F. Milles, Justas Dauparas, Jue Wang, Yakov Kipnis, Noel Jameson, Alex Kang, Joshmyn De La Cruz, Banumathi Sankaran, Asim K. Bera, Gonzalo Jiménez-Osés, David Baker.
Preprint, October 2023.
[10.1101/2023.10.03.560713]

Deploying synthetic coevolution and machine learning to engineer protein-protein interactions.
Aerin Yang, Kevin M Jude, Ben Lai, Mason Minot, Anna M Kocyla, Caleb R Glassman, Daisuke Nishimiya, Yoon Seok Kim, Sai T Reddy, Aly A Khan, K Christopher Garcia.
Science, July 2023
[10.1126/science.adh1720]

Bidirectional Learning for Offline Model-based Biological Sequence Design.
Can Chen, Yingxue Zhang, Xue Liu, Mark Coates.
Preprint, January 2023.
[arxiv]

Plug & Play Directed Evolution of Proteins with Gradient-based Discrete MCMC.
Patrick Emami, Aidan Perreault, Jeffrey Law, David Biagioni, Peter C. St. John.
Preprint, December 2022.
[arxiv]

Combinatorial assembly and design of enzymes.
Rosalie Lipsh-Sokolik, Olga Khersonsky, Sybrin P. Schröder, Casper de Boer, Shlomo-Yakir Hoch, Gideon J. Davies, Hermen S. Overkleeft, Sarel J. Fleishman.
Preprint, December 2022.
[10.1101/2022.09.17.508230]

Forecasting labels under distribution-shift for machine-guided sequence design.
Lauren Berk Wheelock, Stephen Malina, Jeffrey Gerold, Sam Sinai.
Preprint, November 2022
[arxiv]

PropertyDAG: Multi-objective Bayesian optimization of partially ordered, mixed-variable properties for biological sequence design.
Ji Won Park, Samuel Stanton, Saeed Saremi, Andrew Watkins, Henri Dwyer, Vladimir Gligorijevic, Richard Bonneau, Stephen Ra, Kyunghyun Cho.
Preprint, October 2022.
[arxiv]

Designed active-site library reveals thousands of functional GFP variants.
Jonathan Yaacov Weinstein, Carlos Marti Gomez Aldaravi, Rosalie Lipsh-Sokolik, Shlomo Yakir Hoch, Demian Liebermann, Reinat Nevo, Haim Weissman, Ekaterina Petrovich-Kopitman, David Margulies, Dmitry Ivankov, David McCandlish, Sarel Jacob Fleishman.
Preprint, October 2022.
[10.1101/2022.10.11.511732]

Accelerated rational PROTAC design via deep learning and molecular simulations.
Shuangjia Zheng, Youhai Tan, Zhenyu Wang, Chengtao Li, Zhiqing Zhang, Xu Sang, Hongming Chen & Yuedong Yang.
Nature Machine Intelligence, September 2022.
[10.1038/s42256-022-00527-y]

Inferring protein fitness landscapes from laboratory evolution experiments.
Sameer D’Costa, Emily C. Hinds, Chase R. Freschlin, Hyebin Song, Philip A. Romero.
Preprint, September 2022.
[10.1101/2022.09.01.506224]

Antibody optimization enabled by artificial intelligence predictions of binding affinity and naturalness.
Sharrol Bachas, Goran Rakocevic, David Spencer, Anand V. Sastry, Robel Haile, John M. Sutton, George Kasun, Andrew Stachyra, Jahir M. Gutierrez, Edriss Yassine, Borka Medjo, Vincent Blay, Christa Kohnert, Jennifer T. Stanton, Alexander Brown, Nebojsa Tijanic, Cailen McCloskey, Rebecca Viazzo, Rebecca Consbruck, Hayley Carter, Simon Levine, Shaheed Abdulhaqq, Jacob Shaul, Abigail B. Ventura, Randal S. Olson, Engin Yapici, Joshua Meier, Sean McClain, Matthew Weinstock, Gregory Hannum, Ariel Schwartz, Miles Gander, Roberto Spreafico.
Preprint, August 2022.
[10.1101/2022.08.16.504181]

Co-optimization of therapeutic antibody affinity and specificity using machine learning models that generalize to novel mutational space.
Emily K. Makowski, Patrick C. Kinnunen, Jie Huang, Lina Wu, Matthew D. Smith, Tiexin Wang, Alec A. Desai, Craig N. Streu, Yulei Zhang, Jennifer M. Zupancic, John S. Schardt, Jennifer J. Linderman, Peter M. Tessier.
Nature communications, July 2022.
[10.1038/s41467-022-31457-3]

Heterogeneity of the GFP fitness landscape and data-driven protein design.
Louisa Gonzalez Somermeyer, Aubin Fleiss, Alexander S Mishin, Nina G Bozhanova, Anna A Igolkina, Jens Meiler, Maria-Elisenda Alaball Pujol, Ekaterina V Putintseva, Karen S Sarkisyan.
eLife, May 2022.
[10.7554/eLife.75842]

De novo protein design by deep network hallucination.
Ivan Anishchenko, Samuel J. Pellock, Tamuka M. Chidyausiku, Theresa A. Ramelot, Sergey Ovchinnikov, Jingzhou Hao, Khushboo Bafna, Christoffer Norn, Alex Kang, Asim K. Bera, Frank DiMaio, Lauren Carter, Cameron M. Chow, Gaetano T. Montelione & David Baker.
Nature, December 2021.
[10.1038/s41586-021-04184-w]

Informed training set design enables efficient machine learning-assisted directed protein evolution.
Bruce J. Wittmann, Yisong Yue, Frances H. Arnold.
Cell Systems, November 2021.
[10.1016/j.cels.2021.07.008]

Machine learning-based library design improves packaging and diversity of adeno-associated virus (AAV) libraries.
Danqing Zhu, David H. Brookes, Akosua Busia, Ana Carneiro, Clara Fannjiang, Galina Popova, David Shin, Edward F. Chang, Tomasz J. Nowakowski, Jennifer Listgarten, David. V. Schaffer.
Preprint, November 2021.
[10.1101/2021.11.02.467003]

Optimal Design of Stochastic DNA Synthesis Protocols based on Generative Sequence Models.
Eli N. Weinstein, Alan N. Amin, Will Grathwohl, Daniel Kassler, Jean Disset, Debora S. Marks.
Preprint, October 2021.
[10.1101/2021.10.28.466307]

Unifying Likelihood-free Inference with Black-box Sequence Design and Beyond.
Dinghuai Zhang, Jie Fu, Yoshua Bengio, Aaron Courville.
Preprint, October 2021.
[arxiv]

Machine-Directed Evolution of an Imine Reductase for Activity and Stereoselectivity.
Eric J. Ma, Elina Siirola, Charles Moore, Arkadij Kummer, Markus Stoeckli, Michael Faller, Caroline Bouquet, Fabian Eggimann, Mathieu Ligibel, Dan Huynh, Geoffrey Cutler, Luca Siegrist, Richard A. Lewis, Anne-Christine Acker, Ernst Freund, Elke Koch, Markus Vogel, Holger Schlingensiepen, Edward J. Oakeley, and Radka Snajdrova.
ACS Catalysis, September 2021.
[10.1021/acscatal.1c02786]

Conservative Objective Models for Effective Offline Model-Based Optimization.
Brandon Trabucco, Aviral Kumar, Xinyang Geng, Sergey Levine.
Preprint, July 2021.
[arxiv]

Deep Extrapolation for Attribute-Enhanced Generation.
Alvin Chan, Ali Madani, Ben Krause, Nikhil Naik.
Preprint, July 2021.
[arxiv]

Effective Surrogate Models for Protein Design with Bayesian Optimization.
Nate Gruver, Samuel Stanton, Polina Kirichenko, Marc Finzi, Phillip Maffettone, Vivek Myers, Emily Delaney, Peyton Greenside, Andrew Gordon Wilson.
2021 ICML Workshop on Computational Biology, July 2021.
[pdf]

Bayesian optimization with evolutionary and structure-based regularization for directed protein evolution.
Trevor S. Frisby, Christopher James Langmead.
Algorithms for Molecular Biology, July 2021.
[10.1186/s13015-021-00195-4]

Deep Adaptive Design: Amortizing Sequential Bayesian Experimental Design.
Adam Foster, Desi R. Ivanova, Ilyas Malik, Tom Rainforth.
Preprint, July 2021.
[arxiv]

In silico proof of principle of machine learning-based antibody design at unconstrained scale.
Rahmad Akbar,Philippe A. Robert,Cédric R. Weber,Michael Widrich,Robert Frank,Milena Pavlović,Lonneke Scheffer,Maria Chernigovskaya,Igor Snapkov,Andrei Slabodkin,Brij Bhushan Mehta,Enkelejda Miho,Fridtjof Lund-Johansen,Jan Terje Andersen,Sepp Hochreiter, Ingrid Hobæk Haff,Günter Klambauer,Geir Kjetil Sandve,Victor Greiff.
Preprint, July 2021.
[10.1101/2021.07.08.451480]

Deep diversification of an AAV capsid protein by machine learning.
Drew H. Bryant, Ali Bashir, Sam Sinai, Nina K. Jain, Pierce J. Ogden, Patrick F. Riley, George M. Church, Lucy J. Colwell & Eric D. Kelsic.
Nature Biotechnology, February 2021.
[10.1038/s41587-020-00793-4]

Deep Uncertainty and the Search for Proteins.
Zelda Mariet, Ghassen Jerfel, Zi Wang, Christof Angermüller, David Belanger, Suhani Vora, Maxwell Bileschi, Lucy Colwell, D Sculley, Dustin Tran, Jasper Snoek.
NeurIPS 2020 ML for Molecules Workshop, December 2020.
[pdf]

Machine learning-guided acyl-ACP reductase engineering for improved in vivo fatty alcohol production.
Jonathan C. Greenhalgh, Sarah A. Fahlberg, Brian F. Pfleger, Philip A. Romero.
Preprint, May 2021.
[10.1101/2021.05.21.445192]

Large-scale design and refinement of stable proteins using sequence-only models.
Jedediah M. Singer, Scott Novotney, Devin Strickland, Hugh K. Haddox, Nicholas Leiby, Gabriel J. Rocklin, Cameron M. Chow, Anindya Roy, Asim K. Bera, Francis C. Motta, … Eric Klavins.
Preprint, March 2021.
[10.1101/2021.03.12.435185]

AdaLead: A simple and robust adaptive greedy search algorithm for sequence design.
Sam Sinai, Richard Wang, Alexander Whatley, Stewart Slocum, Elina Locane, Eric D. Kelsic. Preprint, October 2020.
[arxiv]

The NK Landscape as a Versatile Benchmark for Machine Learning Driven Protein Engineering.
Adam C. Mater, Mahakaran Sandhu, Colin Jackson.
Preprint, October 2020.
[10.1101/2020.09.30.319780]

Learning with uncertainty for biological discovery and design.
Brian Hie, Bryan Bryson, Bonnie Berger.
Preprint, August 2020.
[10.1101/2020.08.11.247072]

Population-Based Black-Box Optimization for Biological Sequence Design.
Christof Angermueller, David Belanger, Andreea Gane, Zelda Mariet, David Dohan, Kevin Murphy, Lucy Colwell, D Sculley.
ICML, July 2020.
[ICML]

Autofocused oracles for model-based design.
Clara Fannjiang, Jennifer Listgarten.
Preprint, June 2020.
[arxiv]

Domain Extrapolation via Regret Minimization.
Wengong Jin, Regina Barzilay, Tommi Jaakkola.
Preprint, June 2020.
[arxiv]

Fast differentiable DNA and protein sequence optimization for molecular design.
Johannes Linder, Georg Seelig.
Preprint, May 2020.
[arxiv]

A Deep Dive into Machine Learning Models for Protein Engineering.
Yuting Xu, Deeptak Verma, Robert P Sheridan, Andy Liaw, Junshui Ma, Nicholas Marshall, John McIntosh, Edward C. Sherer, Vladimir Svetnik, Jennifer Johnston.
Journal of Chemical Information and Modeling, April 2020.
[10.1021/acs.jcim.0c00073]

Evolutionary context-integrated deep sequence modeling for protein engineering.
Yunan Luo, Lam Vo, Hantian Ding, Yufeng Su, Yang Liu, Wesley Wei Qian, Huimin Zhao, Jian Peng.
Preprint, January 2020.
[10.1101/2020.01.16.908509]

Biological Sequence Design using Batched Bayesian Optimization.
David Belanger, Suhani Vora, Zelda Mariet, Ramya Deshpande, David Dohan, Christof Angermueller, Kevin Murphy, Olivier Chapelle, Lucy Colwell.
NeurIPS Workshop on Machine Learning and the Physical Sciences, December 2019.
[ML4PS]

Model Inversion Networks for Model-Based Optimization.
Aviral Kumar, Sergey Levine Preprint, December 2019.
[arxiv]

Interpreting mutational effects predictions, one substitution at a time.
C. K. Sruthi, Meher K. Prakash.
bioRxiv, December 2019
[10.1101/867812]

A structure-based deep learning framework for protein engineering.
Raghav Shroff, Austin W. Cole, Barrett R. Morrow, Daniel J. Diaz, Isaac Donnell, Jimmy Gollihar, Andrew D. Ellington, Ross Thyer.
Preprint, November 2019.
[10.1101/833905]

Comprehensive AAV capsid fitness landscape reveals a viral gene and enables machine-guided design.
Pierce J. Ogden, Eric D. Kelsic, Sam Sinai, George M. Church.
Science, November 2019.
[10.1126/science.aaw2900]

Machine learning-guided channelrhodopsin engineering enables minimally-invasive optogenetics.
Claire N. Bedbrook, Kevin K. Yang, J. Elliott Robinson, Viviana Gradinaru, Frances H Arnold.
Nature Methods, October 2019.
[10.1038/s41592-019-0583-8]
Preprint available on [bioRxiv]

Batched Stochastic Bayesian Optimization via Combinatorial Constraints Design.
Kevin K. Yang, Yuxin Chen, Alycia Lee, Yisong Yue.
International Conference on Artificial Intelligence and Statistics (AISTATS), April 2019.
[arxiv] [PMLR]

Machine learning-assisted directed protein evolution with combinatorial libraries.
Zachary Wu, S. B. Jennifer Kan, Russell D. Lewis, Bruce J. Wittmann, Frances H. Arnold.
PNAS, April 2019.
[10.1073/pnas.1901979116]

Conditioning by adaptive sampling for robust design.
David H. Brookes, Hahnbeom Park, Jennifer Listgarten.
Preprint, January 2019.
[arxiv]

A machine learning approach for reliable prediction of amino acid interactions and its application in the directed evolution of enantioselective enzymes.
Frédéric Cadet, Nicolas Fontaine, Guangyue Li, Joaquin Sanchis, Matthieu Ng Fuk Chong, Rudy Pandjaitan, Iyanar Vetrivel, Bernard Offmann, Manfred T. Reetz.
Scientific Reports, November 2018.
[10.1038/s41598-018-35033-y]

Design by adaptive sampling.
David H. Brookes, Jennifer Listgarten.
Preprint, October 2018.
[arxiv]

Machine-Learning-Guided Mutagenesis for Directed Evolution of Fluorescent Proteins.
Yutaka Saito, Misaki Oikawa, Hikaru Nakazawa, Teppei Niide, Tomoshi Kameda, Koji Tsuda, and Mitsuo Umetsu.
ACS Synthetic Biology, August 2018.
[10.1021/acssynbio.8b00155]

Toward machine-guided design of proteins.
Surojit Biswas, Gleb Kuznetsov, Pierce J. Ogden, Nicholas J. Conway, Ryan P. Adams, George M. Church.
Preprint, June 2018.
[10.1101/337154] [bioRxiv]

Feedback GAN (FBGAN) for DNA: a Novel Feedback-Loop Architecture for Optimizing Protein Functions.
Anvita Gupta, James Zou.
Preprint, April 2018.
[arxiv]

Machine learning to design integral membrane channelrhodopsins for efficient eukaryotic expression and plasma membrane localization.
Claire N. Bedbrook, Kevin K. Yang, Austin J. Rice, Viviana Gradinaru, Frances H. Arnold.
PLOS Computational Biology, October 2017.
[10.1371/journal.pcbi.1005786]

Exploring sequence-function space of a poplar glutathione transferase using designed information-rich gene variants.
Yaman Musdal, Sridhar Govindarajan, Bengt Mannervik.
Protein Engineering, Design, and Selection, August 2017.
[10.1093%2Fprotein%2Fgzx045]

Navigating the protein fitness landscape with Gaussian processes.
Philip A. Romero, Andreas Krause, Frances H. Arnold.
PNAS, January 2013.
[10.1073/pnas.1215251110]

Engineering proteinase K using machine learning and synthetic genes.
Jun Liao, Manfred K. Warmuth, Sridhar Govindarajan, Jon E. Ness, Rebecca P Wang, Claes Gustafsson, Jeremy Minshull.
BMC Biotechnology, March 2007.
[10.1186/1472-6750-7-16]

Improving catalytic function by ProSAR-driven enzyme evolution.
Richard J. Fox, S. Christopher Davis, Emily C. Mundorff, Lisa M. Newman, Vesna Gavrilovic, Steven K. Ma, Loleta M. Chung, Charlene Ching, Sarena Tam, Sheela Muley, John Grate, John Gruber, John C. Whitman, Roger A. Sheldon, Gjalt W. Huisman.
Nature Biotechnology, February 2007.
[Nature Biotechnology]

Representation learning

Two sequence- and two structure-based ML models have learned different aspects of protein biochemistry.
Anastasiya V. Kulikova, Daniel J. Diaz, Tianlong Chen, T. Jeffrey Cole, Andrew D. Ellington & Claus O. Scientific reports, August 2023.
[10.1038/s41598-023-40247-w]

Domain-PFP: Protein Function Prediction Using Function-Aware Domain Embedding Representations.
Nabil Ibtehaz, Yuki Kagaya, Daisuke Kihara.
Preprint, August 2023.
[10.1101/2023.08.23.554486]

Contextual protein and antibody encodings from equivariant graph transformers.
Sai Pooja Mahajan, Jeffrey A. Ruffolo, Jeffrey J. Gray.
Preprint, July 2023.
[10.1101/2023.07.15.549154]

Ankh: Optimized Protein Language Model Unlocks General-Purpose Modelling.
Ahmed Elnaggar, Hazem Essam, Wafaa Salah-Eldin, Walid Moustafa, Mohamed Elkerdawy, Charlotte Rochereau, Burkhard Rost.
Preprint, June 2023.
[arxiv]

Structure-aware protein self-supervised learning.
Can (Sam) Chen, Jingbo Zhou, Fan Wang, Xue Liu, Dejing Dou.
Bioinformatics, April 2023.
[10.1093/bioinformatics/btad189]

Lightweight Contrastive Protein Structure-Sequence Transformation.
Jiangbin Zheng, Ge Wang, Yufei Huang, Bozhen Hu, Siyuan Li, Cheng Tan, Xinwen Fan, Stan Z. Li.
Preprint, March 2023.
[arxiv]

A Systematic Study of Joint Representation Learning on Protein Sequences and Structures.
Zuobai Zhang, Chuanrui Wang, Minghao Xu, Vijil Chenthamarakshan, Aurélie Lozano, Payel Das, Jian Tang.
Preprint, March 2023.
[arxiv]

Structure-informed Language Models Are Protein Designers.
Zaixiang Zheng, Yifan Deng, Dongyu Xue, Yi Zhou, Fei Ye, Quanquan Gu.
Preprint, Feb 2023.
[10.1101/2023.02.03.526917]

Retrieved Sequence Augmentation for Protein Representation Learning.
Chang Ma, Haiteng Zhao, Lin Zheng, Jiayi Xin, Qintong Li, Lijun Wu, Zhihong Deng, Yang Lu, Qi Liu, Lingpeng Kong.
Preprint, Feb 2023.
[10.1101/2023.02.22.529597]

ProtST: Multi-Modality Learning of Protein Sequences and Biomedical Texts.
Minghao Xu, Xinyu Yuan, Santiago Miret, Jian Tang.
Preprint, January 2023.
[arxiv]

Codon language embeddings provide strong signals for protein engineering.
Carlos Outeiral, Charlotte M. Deane.
Preprint, December 2022.
[10.1101/2022.12.15.519894]

When Geometric Deep Learning Meets Pretrained Protein Language Models.
Fang Wu, Yu Tao, Dragomir Radev, Jinbo Xu.
Preprint, December 2022.
[arxiv]

Contrastive learning of protein representations with graph neural networks for structural and functional annotations.
Jiaqi Luo, Yunan Luo.
Preprint, December 2022.
[10.1101/2022.11.29.518451]

Training self-supervised peptide sequence models on artificially chopped proteins .
Gil Sadeh, Zichen Wang, Jasleen Grewal, Huzefa Rangwala, Layne Price.
Preprint, November 2022.
[arxiv]

Language models of protein sequences at the scale of evolution enable accurate structure prediction.
Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido, Alexander Rives.
Preprint, July 2022.
[10.1101/2022.07.20.500902]

Advancing protein language models with linguistics: a roadmap for improved interpretability.
Mai Ha Vu, Rahmad Akbar, Philippe A. Robert, Bartlomiej Swiatczak, Victor Greiff, Geir Kjetil Sandve, Dag Trygve Truslew Haug.
Preprint, July 2022.
[arxiv]

Self-supervised deep learning encodes high-resolution features of protein subcellular localization.
Hirofumi Kobayashi, Keith C. Cheveralls, Manuel D. Leonetti & Loic A. Royer.
Nature Methods, July 2022.
[10.1038/s41592-022-01541-z]

COLLAPSE: A representation learning framework for identification and characterization of protein structural sites.
Alexander Derry, Russ B. Altman.
Preprint, July 2022.
[10.1101/2022.07.20.500713]

CoSP: Co-supervised pretraining of pocket and ligand.
Zhangyang Gao, Cheng Tan, Lirong Wu, Stan Z. Li.
Preprint, June 2022.
[arxiv]

Pre-training Protein Models with Molecular Dynamics Simulations for Drug Binding.
Wu F, Zhang Q, Radev D, Wang Y, Jin X, Jiang Y, Li SZ, Niu Z.
Preprint, June 2022.
[10.21203/rs.3.rs-1566483/v1]

Exploring evolution-based &-free protein language models as protein function predictors.
Mingyang Hu, Fajie Yuan, Kevin K. Yang, Fusong Ju, Jin Su, Hui Wang, Fei Yang, Qiuyang Ding.
Preprint, June 2022.
[arxiv]

Masked inverse folding with sequence transfer for protein representation learning.
Kevin K. Yang, Niccolò Zanichelli, Hugh Yeh.
Preprint, June 2022.
[10.1101/2022.05.25.493516]

Convolutions are competitive with transformers for protein sequence pretraining.
Kevin K. Yang, Alex X. Lu, Nicolo Fusi.
Preprint, June 2022.
[10.1101/2022.05.19.492714]]

Evolutionary velocity with protein language models.
Brian L. Hie, Kevin K. Yang, Peter S. Kim.
Cell Systems, April 2022.
[10.1016/j.cels.2022.01.003]

Identification of Enzymatic Active Sites with Unsupervised Language Modeling.
Loïc Kwate Dassi, Matteo Manica, Daniel Probst, Philippe Schwaller, Yves Gaetan Nana Teukam, Teodoro Laino.
Preprint, November 2021.
[10.33774/chemrxiv-2021-m20gg]

Artificial Intelligence Guided Conformational Mining of Intrinsically Disordered Proteins.
Aayush Gupta, Souvik Dey, Huan-Xiang Zhou.
Preprint, November 2021.
[10.1101/2021.11.21.469457]

Deciphering the language of antibodies using self-supervised learning.
Jinwoo Leem, Laura S. Mitchell, James H.R. Farmery, Justin Barton, Jacob D. Galson.
Preprint, November 2021.
[10.1101/2021.11.10.468064]

Pre-training Co-evolutionary Protein Representation via A Pairwise Masked Language Model.
Liang He, Shizhuo Zhang, Lijun Wu, Huanhuan Xia, Fusong Ju, He Zhang, Siyuan Liu, Yingce Xia, Jianwei Zhu, Pan Deng, Bin Shao, Tao Qin, Tie-Yan Liu.
Preprint, October 2021.
[arxiv]

Neural Distance Embeddings for Biological Sequences.
Gabriele Corso, Rex Ying, Michal Pándy, Petar Veličković, Jure Leskovec, Pietro Liò.
Preprint, September 2021.
[arxiv]

Biologically relevant transfer learning improves transcription factor binding prediction.
Gherman Novakovsky, Manu Saraswat, Oriol Fornes, Sara Mostafavi & Wyeth W. Wasserman.
Genome Biology, September 2021.
[10.1186/s13059-021-02499-5]

Toward More General Embeddings for Protein Design: Harnessing Joint Representations of Sequence and Structure.
Sanaa Mansoor, Minkyung Baek, Umesh Madan, Eric Horvitz.
Preprint, September 2021.
[10.1101/2021.09.01.458592]

Hydrogen bonds meet self-attention: all you need for general-purpose protein structure embedding.
Cheng Chen, Yuguo Zha, Daming Zhu, Kang Ning, Xuefeng Cui.
Preprint, August 2021.
[10.1101/2021.01.31.428935]

Discovering molecular features of intrinsically disordered regions by using evolution for contrastive learning.
Alex X Lu, Amy X Lu, Iva Pritišanac, Taraneh Zarin, Julie D Forman-Kay, Alan M Moses.
Preprint, July 2021.
[10.1101/2021.07.29.454330]

Inferring a Continuous Distribution of Atom Coordinates from Cryo-EM Images using VAEs.
Dan Rosenbaum, Marta Garnelo, Michal Zielinski, Charlie Beattie, Ellen Clancy, Andrea Huber, Pushmeet Kohli, Andrew W. Senior, John Jumper, Carl Doersch, S. M. Ali Eslami, Olaf Ronneberger, Jonas Adler.
Preprint, June 2021.. [arxiv]

Pretraining model for biological sequence data.
Bosheng Song, Zimeng Li, Xuan Lin, Jianmin Wang, Tian Wang, Xiangzheng Fu.
Briefings in Functional Genomics, May 2021.
[10.1093/bfgp/elab025]

ProteinBERT: A universal deep-learning model of protein sequence and function.
Nadav Brandes, Dan Ofer, Yam Peleg, Nadav Rappoport, Michal Linial.
Preprint, May 2021.
[10.1101/2021.05.24.445464]

Random Embeddings and Linear Regression can Predict Protein Function.
Tianyu Lu, Alex X. Lu, Alan M. Moses.
Preprint, April 2021.
[arxiv]

Combining evolutionary and assay-labelled data for protein fitness prediction.
Chloe Hsu, Hunter Nisonoff, Clara Fannjiang, Jennifer Listgarten.
Preprint, March 2021.
[10.1101/2021.03.28.437402]

MSA Transformer.
Roshan Rao, Jason Liu, Robert Verkuil, Joshua Meier, John F. Canny, Pieter Abbeel, Tom Sercu, Alexander Rives.
Preprint, February 2021.
[10.1101/2021.02.12.430858]

Improving Generalizability of Protein Sequence Models with Data Augmentations.
Hongyu Shen, Layne C. Price, Taha Bahadori, Franziska Seeger.
Preprint, February 2021.
[10.1101/2021.02.18.431877]

Capturing Protein Domain Structure and Function Using Self-Supervision on Domain Architectures.
Damianos P. Melidis, Wolfgang Nejdl.
Algorithms, January 2021.
[10.3390/a14010028]

Adversarial Contrastive Pre-training for Protein Sequences.
Matthew B. A. McDermott, Brendan Yap, Harry Hsu, Di Jin, Peter Szolovits. Preprint, January 2021.
[arxiv]

Fast end-to-end learning on protein surfaces.
Freyr Sverrisson, Jean Feydy, Bruno E. Correia, Michael M. Bronstein.
Preprint, December 2020.
[10.1101/2020.12.28.424589]

Transformer protein language models are unsupervised structure learners.
Roshan Rao, Sergey Ovchinnikov, Joshua Meier, Alexander Rives, Tom Sercu.
Preprint, December 2020.
[10.1101/2020.12.15.422761]

Self-Supervised Representation Learning of Protein Tertiary Structures (PtsRep): Protein Engineering as A Case Study.
Junwen Luo, Yi Cai, Jialin Wu, Hongmin Cai, Xiaofeng Yang, Zhanglin Lin.
Preprint, December 2020.
[10.1101/2020.12.22.423916]

What is a meaningful representation of protein sequences?. Nicki Skafte Detlefsen, Søren Hauberg, Wouter Boomsma.
Preprint, November 2020.
[arxiv]

Profile Prediction: An Alignment-Based Pre-Training Task for Protein Sequence Models.
Pascal Sturmfels, Jesse Vig, Ali Madani, Nazneen Fatema Rajani. Preprint, November 2020.
[arxiv]

Fixed-Length Protein Embeddings using Contextual Lenses.
Amir Shanehsazzadeh, David Belanger, David Dohan. Preprint, October 2020.
[arxiv]

Evaluation of Methods for Protein Representation Learning: A Quantitative Analysis.
Serbulent Unsal, Heval Ataş, Muammer Albayrak, Kemal Turhan, Aybar C. Acar, Tunca Doğan.
Preprint, October 2020.
[10.1101/2020.10.28.359828]

Self-Supervised Contrastive Learning of Protein Representations By Mutual Information Maximization.
Amy X. Lu, Haoran Zhang, Marzyeh Ghassemi, Alan Moses.
Preprint, September 2020.
[10.1101/2020.09.04.283929]

ProtTrans: Towards Cracking the Language of Life’s Code Through Self-Supervised Deep Learning and High Performance Computing.
Ahmed Elnaggar, Michael Heinzinger, Christian Dallago, Ghalia Rehawi, Yu Wang, Llion Jones, Tom Gibbs, Tamas Feher, Christoph Angerer, Martin Steinegger, Debsindhu Bhowmik, Burkhard Rost.
Preprint, July 2020.
[10.1101/2020.07.12.199554]

Unsupervised protein embeddings outperform hand-crafted sequence and structure features at predicting molecular function.
Amelia Villegas-Morcillo, Stavros Makrodimitris, Roeland van Ham, Angel M. Gomez, Victoria Sanchez, Marcel Reinders.
Preprint, April 2020.
[10.1101/2020.04.07.028373]

Site2Vec: a reference frame invariant algorithm for vector embedding of protein-ligand binding sites.
Arnab Bhadra, Kalidas Y.
Preprint, March 2020.
[arxiv]

Evolutionary context-integrated deep sequence modeling for protein engineering.
Yunan Luo, Lam Vo, Hantian Ding, Yufeng Su, Yang Liu, Wesley Wei Qian, Huimin Zhao, Jian Peng.
Preprint, January 2020.
[10.1101/2020.01.16.908509]

Sequence representations and their utility for predicting protein-protein interactions.
Dhananjay Kimothi, Pravesh Biyani, James M Hogan.
Preprint, December 2019.
[10.1101/2019.12.31.890699]

Language modelling for biological sequences – curated datasets and baselines.
Jose Juan Almagro Armenteros, Alexander Rosenberg Johansen, Ole Winther, Henrik Nielsen.
Preprint, December 2019.
[alrojo.github.io]

Deciphering protein evolution and fitness landscapes with latent space models
Xinqiang Ding, Zhengting Zou, Charles L. Brooks III.
Nature Communications, December 2019.
[10.1038/s41467-019-13633-0]

End-to-end multitask learning, from protein language to protein features without alignments.
Ahmed Elnaggar, Michael Heinzinger, Christian Dallago, Burkhard Rost.
Preprint, December 2019.
[10.1101/864405]

Unified rational protein engineering with sequence-only deep representation learning.
Ethan C. Alley, Grigory Khimulya, Surojit Biswas, Mohammed AlQuraishi, George M. Church.
Nature Methods, October 2019
[10.1038/s41592-019-0598-1]

Structure-Based Function Prediction using Graph Convolutional Networks.
Vladimir Gligorijevic, P. Douglas Renfrew, Tomasz Kosciolek, Julia Koehler Leman, Kunghyun Cho, Tommi Vatanen, Daniel Berenberg, Bryn Taylor, Ian M. Fisk, Ramnik J. Xavier, Rob Knight, Richard Bonneau.
Preprint, October 2019.
[0.1101/786236]

Modeling the language of life – Deep Learning Protein Sequences.
Michael Heinzinger, Ahmed Elnaggar, Yu Wang, Christian Dallago, Dmitrii Nechaev, Florian Matthes, Burkhard Rost.
Preprint, September 2019.
[10.1101/614313]

Augmenting Protein Network Embeddings with Sequence Information.
Hassan Kane, Mohamed K. Coulibali, Pelkins Ajanoh, Ali Abdallah.
Preprint, August 2019.
[10.1101/730481]

Universal Deep Sequence Models for Protein Classification.
Nils Strodthoff, Patrick Wagner, Markus Wenzel, Wojciech Samek.
Preprint, July 2019.
[10.1101/704874]

DeepPrime2Sec: Deep Learning for Protein Secondary Structure Prediction from the Primary Sequences.
Ehsaneddin Asgari, Nina Poerner, Alice C. McHardy, Mohammad R.K. Mofrad.
Preprint, July 2019.
[10.1101/705426]

A Self-Consistent Sonification Method to Translate Amino Acid Sequences into Musical Compositions and Application in Protein Design Using Artificial Intelligence.
Chi-Hua Yu, Zhao Qin, Francisco J. Martin-Martinez, Markus J. Buehler.
ACS Nano, June 2019.
[10.1021/acsnano.9b02180]

Evaluating Protein Transfer Learning with TAPE.
Roshan Rao, Nicholas Bhattacharya, Neil Thomas, Yan Duan, Xi Chen, John Canny, Pieter Abbeel, Yun S. Song.
Preprint, June 2019.
[arxiv]

Leveraging implicit knowledge in neural networks for functional dissection and engineering of proteins.
Julius Upmeier zu Belzen, Thore Bürgel, Stefan Holderbach, Felix Bubeck, Lukas Adam, Catharina Gandor, Marita Klein, Jan Mathony, Pauline Pfuderer, Lukas Platz, Moritz Przybilla, Max Schwendemann, Daniel Heid, Mareike Daniela Hoffmann, Michael Jendrusch, Carolin Schmelas, Max Waldhauer, Irina Lehmann, Dominik Niopek, Roland Eils.
Nature Machine Intelligence, May 2019.
[Nature Machine Intelligence]

Modeling the Language of Life – Deep Learning Protein Sequences.
Michael Heinzinger, Ahmed Elnaggar, Yu Wang, Christian Dallago, Dmitrii Nechaev, Florian Matthes, Burkhard Rost.
Preprint, May 2019.
[10.1101/614313] [bioRxiv]

Biological Structure and Function Emerge from Scaling Unsupervised Learning to 250 Million Protein Sequences.
Alexander Rives, Siddharth Goyal, Joshua Meier, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, Rob Fergus.
Preprint, April 2019.
[10.1101/622803] [bioRxiv]

Learning protein constitutive motifs from sequence data.
Jérôme Tubiana, Simona Cocco, Rémi Monasson.
eLife, March 2019.
[10.7554/eLife.39397]

Probabilistic variable-length segmentation of protein sequences for discriminative motif discovery (DiMotif) and sequence embedding (ProtVecX).
Ehsaneddin Asgari, Alice C. McHardy, Mohammad R. K. Mofrad.
Scientific Reports, March 2019.
[10.1038/s41598-019-38746-w]

Learning protein sequence embeddings using information from structure.
Tristan Bepler, Bonnie Berger.
International Conference on Learning Representations, February 2019.
[ICLR]

Application of fourier transform and proteochemometrics principles to protein engineering.
Frédéric Cadet, Nicolas Fontaine, Iyanar Vetrivel, Matthieu Ng Fuk Chong, Olivier Savriama, Xavier Cadet, Philippe Charton.
BMC Bioinformatics, October 2018.
[10.1186/s12859-018-2407-8]

Learned protein embeddings for machine learning.
Kevin K Yang, Zachary Wu, Claire N Bedbrook, Frances H Arnold
Bioinformatics, August 2018
[10.1093/bioinformatics/bty178]

Deep Semantic Protein Representation for Annotation, Discovery, and Engineering.
Ariel S Schwartz, Gregory J Hannum, Zach R Dwiel, Michael E Smoot, Ana R Grant, Jason M Knight, Scott A Becker, Jonathan R Eads, Matthew C LaFave, Harini Eavani, Yinyin Liu, Arjun K Bansal, Toby H Richardson
Preprint, July 2018
[10.1101/365965]

Improved Descriptors for the Quantitative Structure–Activity Relationship Modeling of Peptides and Proteins.
Mark H. Barley, Nicholas J. Turner, Royston Goodacre.
Journal of Chemical Information and Modeling, January 2018.
[10.1021/acs.jcim.7b00488]

Variational auto-encoding of protein sequences.
Sam Sinai, Eric Kelsic, George M. Church, Martin A. Nowak
Preprint, December 2017
[arxiv]

Predicting Protein Binding Affinity With Word Embeddings and Recurrent Neural Networks.
Carlo Mazzaferro.
Preprint, April 2017.
[10.1101/128223] [bioRxiv]

dna2vec: Consistent vector representations of variable-length k-mers.
Patrick Ng
Preprint, January 2017
[arxiv]

Distributed Representations for Biological Sequence Analysis.
Dhananjay Kimothi, Akshay Soni, Pravesh Biyani, James M. Hogan
Preprint, August 2016
[arxiv]

ProFET: Feature engineering captures high-level protein functions.
Dan Ofer, Michal Linial.
Bioinformatics, June 2015.
[10.1093/bioinformatics/btv345]

AAindex: amino acid index database, progress report 2008.
Shuichi Kawashima, Piotr Pokarowski, Maria Pokarowska, Andrzej Kolinski, Toshiaki Katayama, Minoru Kanehisa.
Nucleic Acids Research, January 2008.
[10.1093/nar/gkm998]

Unsupervised variant prediction

AlphaFold2 can predict single-mutation effects.
John M. McBride, Konstantin Polev, Amirbek Abdirasulov, Vladimir Reinharz, Bartosz A. Grzybowski, Tsvi Tlusty.
Nature, October 2023.
[10.1101/2022.04.14.488301]

Learning from prepandemic data to forecast viral escape.
Nicole N. Thadani, Sarah Gurev, Pascal Notin, Noor Youssef, Nathan J. Rollins, Daniel Ritter, Chris Sander, Yarin Gal & Debora S. Marks.
Nature, October 2023.
[10.1038/s41586-023-06617-0]

Genome-wide prediction of disease variant effects with a deep protein language model.
Nadav Brandes, Grant Goldman, Charlotte H. Wang, Chun Jimmie Ye & Vasilis Ntranos.
Nature Genetics, August 2023.
[10.1038/s41588-023-01465-0]

Protein Fitness Prediction is Impacted by the Interplay of Language Models, Ensemble Learning, and Sampling Methods.
Mehrsa Mardikoraem, Daniel Woldring.
Preprint, February 2023.
[10.1101/2023.02.09.527362]

Predicting Immune Escape with Pretrained Protein Language Model Embeddings.
Kyle Swanson, Howard Chang, James Zou.
Preprint, December 2022.
[10.1101/2022.11.30.518466]

Protein language model rescue mutations highlight variant effects and structure in clinically relevant genes.
Onuralp Soylemez, Pablo Cordero.
Preprint, November 2022.
[arxiv]

Updated benchmarking of variant effect predictors using deep mutational scanning.
Benjamin J. Livesey, Joseph A. Marsh.
Preprint, November 2022.
[10.1101/2022.11.19.517196]

Accurate Mutation Effect Prediction using RoseTTAFold.
Sanaa Mansoor, Minkyung Baek, David Juergens, Joseph L. Watson, David Baker.
Preprint, November 2022.
[10.1101/2022.11.04.515218]

Evotuning protocols for Transformer-based variant effect prediction on multi-domain proteins.
Hideki Yamaguchi, Yutaka Saito.
Briefings in Bioinformatics, November 2021.
[10.1093/bib/bbab234]

Disease variant prediction with deep generative models of evolutionary data.
Jonathan Frazer, Pascal Notin, Mafalda Dias, Aidan Gomez, Joseph K Min, Kelly Brock, Yarin Gal, Debora S Marks.
Nature, November 2021.
[10.1038/s41586-021-04043-8]

Language models enable zero-shot prediction of the effects of mutations on protein function.
Joshua Meier, Roshan Rao, Robert Verkuil, Jason Liu, Tom Sercu, Alexander Rives.
Preprint, July 2021.
[10.1101/2021.07.09.450648]

Unsupervised inference of protein fitness landscape from deep mutational scan.
Jorge Fernandez-de-Cossio-Diaz, Guido Uguzzoni, Andrea Pagnani.
Preprint, March 2020.
[10.1101/2020.03.18.996595]

Deep generative models of genetic variation capture the effects of mutations.
Adam J. Riesselman, John B. Ingraham, Debora S. Marks.
Nature Methods, September 2018
[10.1038/s41592-018-0138-4]

Variational auto-encoding of protein sequences.
Sam Sinai, Eric Kelsic, George M. Church, Martin A. Nowak
Preprint, December 2017
[arxiv]

Generative models

Harmonic Self-Conditioned Flow Matching for Multi-Ligand Docking and Binding Site Design.
Hannes Stärk, Bowen Jing, Regina Barzilay, Tommi Jaakkola.
Preprint, November 2023.
[arxiv]

Fast protein backbone generation with SE(3) flow matching.
Jason Yim, Andrew Campbell, Andrew Y. K. Foong, Michael Gastegger, José Jiménez-Luna, Sarah Lewis, Victor Garcia Satorras, Bastiaan S. Veeling, Regina Barzilay, Tommi Jaakkola, Frank Noé.
Preprint, October 2023.
[arxiv]

SE(3)-Stochastic Flow Matching for Protein Backbone Generation.
Avishek Joey Bose, Tara Akhound-Sadegh, Kilian Fatras, Guillaume Huguet, Jarrid Rector-Brooks, Cheng-Hao Liu, Andrei Cristian Nica, Maksym Korablyov, Michael Bronstein, Alexander Tong.
Preprint, October 2023.
[arxiv]

Joint Design of Protein Sequence and Structure based on Motifs.
Zhenqiao Song, Yunlong Zhao, Yufei Song, Wenxian Shi, Yang Yang, Lei Li.
Preprint, October 2023.
[arxiv]

PepMLM: Target Sequence-Conditioned Generation of Peptide Binders via Masked Language Modeling.
Tianlai Chen, Sarah Pertsemlidis, Rio Watson, Venkata Srikar Kavirayuni, Ashley Hsu, Pranay Vure, Rishab Pulugurta, Sophia Vincoff, Lauren Hong, Tian Wang, Vivian Yudistyra, Elena Haarer, Lin Zhao, Pranam Chatterjee.
Preprint, October 2023.
[arxiv]

Enhancing Luciferase Activity and Stability through Generative Modeling of Natural Enzyme Sequences.
Wen Jun Xie, Dangliang Liu, Xiaoya Wang, Aoxuan Zhang, Qijia Wei, Ashim Nandi, Suwei Dong, Arieh Warshel.
Preprint, October 2023.
[10.1101/2023.09.18.558367]

Protein generation with evolutionary diffusion: sequence is all you need.
Sarah Alamdari, Nitya Thakkar, Rianne van den Berg, Alex Xijie Lu, Nicolo Fusi, Ava Pardis Amini, Kevin K Yang.
Preprint, September 2023.
[10.1101/2023.09.11.556673]

Efficient and accurate sequence generation with small-scale protein language models.
Yaiza Serrano, Sergi Roda, Victor Guallar, Alexis Molina.
Preprint, August 2023.
[10.1101/2023.08.04.551626]

SE(3) diffusion model with application to protein backbone generation.
Jason Yim, Brian L. Trippe, Valentin De Bortoli, Emile Mathieu, Arnaud Doucet, Regina Barzilay, Tommi Jaakkola.
ICML, July 2023.
ACM

De novo design of protein structure and function with RFdiffusion.
Joseph L. Watson, David Juergens, Nathaniel R. Bennett, Brian L. Trippe, Jason Yim, Helen E. Eisenach, Woody Ahern, Andrew J. Borst, Robert J. Ragotte, Lukas F. Milles, Basile I. M. Wicky, Nikita Hanikel, Samuel J. Pellock, Alexis Courbet, William Sheffler, Jue Wang, Preetham Venkatesh, Isaac Sappington, Susana Vázquez Torres, Anna Lauko, Valentin De Bortoli, Emile Mathieu, Sergey Ovchinnikov, Regina Barzilay, Tommi S. Jaakkola, Frank DiMaio, Minkyung Baek & David Baker.
Nature, July 2023.
[10.1038/s41586-023-06415-8]

PoET: A generative model of protein families as sequences-of-sequences.
Timothy F. Truong Jr, Tristan Bepler.
Preprint, June 2023.
[arxiv]

Protein Sequence and Structure Co-Design with Equivariant Translation.
Chence Shi, Chuanrui Wang, Jiarui Lu, Bozitao Zhong, Jian Tang.
ICLR, May 2023.
[arxiv]

Generative design of de novo proteins based on secondary-structure constraints using an attention-based diffusion model.
Bo Ni, David L. Kaplan, Markus J. Buehler.
Cell Chem, April 2023.
[10.1016/j.chempr.2023.03.020]

ProtWave-VAE: Integrating autoregressive sampling with latent-based inference for data-driven protein design.
Niksa Praljak, Xinran Lian, Rama Ranganathan, Andrew L. Ferguson.
Preprint, April 2023.
[10.1101/2023.04.23.537971]

ProtFIM: Fill-in-Middle Protein Sequence Design via Protein Language Models.
Youhan Lee, Hasun Yu.
Preprint, March 2023.
[arxiv]

Extrapolative Controlled Sequence Generation via Iterative Refinement.
Vishakh Padmakumar, Richard Yuanzhe Pang, He He, Ankur P. Parikh.
Preprint, March 2023.
[arxiv]

ProteinVAE: Variational AutoEncoder for Translational Protein Design.
Suyue Lyu, Shahin Sowlati-Hashjin, Michael Garton.
Preprint, March 2023.
[10.1101/2023.03.04.531110]

Generative power of a protein language model trained on multiple sequence alignments.
Damiano Sgarbossa, Umberto Lupo, Anne-Florence Bitbol.
eLife, Februrary 2023.
[10.7554/eLife.79854]

Evaluating Prompt Tuning for Conditional Protein Sequence Generation.
Andrea Nathansen, Kevin Klein, Bernhard Y. Renard, Melania Nowicka, Jakub M. Bartoszewicz.
Preprint, February 2023.
[10.1101/2023.02.28.530492]

De novo design of luciferases using deep learning.
Andy Hsien-Wei Yeh, Christoffer Norn, Yakov Kipnis, Doug Tischer, Samuel J. Pellock, Declan Evans, Pengchen Ma, Gyu Rie Lee, Jason Z. Zhang, Ivan Anishchenko, Brian Coventry, Longxing Cao, Justas Dauparas, Samer Halabiya, Michelle DeWitt, Lauren Carter, K. N. Houk & David Baker.
Nature, February 2023.
[10.1038/s41586-023-05696-3]

A Text-guided Protein Design Framework.
Shengchao Liu, Yutao Zhu, Jiarui Lu, Zhao Xu, Weili Nie, Anthony Gitter, Chaowei Xiao, Jian Tang, Hongyu Guo, Anima Anandkumar.
Preprint, February 2023.
[arxiv]

Large language models generate functional protein sequences across diverse families.
Ali Madani, Ben Krause, Eric R. Greene, Subu Subramanian, Benjamin P. Mohr, James M. Holton, Jose Luis Olmos Jr., Caiming Xiong, Zachary Z. Sun, Richard Socher, James S. Fraser & Nikhil Naik.
Nature Biotechnology, January 2023.
[10.1038/s41587-022-01618-2]

Unlocking de novo antibody design with generative artificial intelligence.
Amir Shanehsazzadeh, Sharrol Bachas, George Kasun, John M. Sutton, Andrea K. Steiger, Richard Shuai, Christa Kohnert, Alex Morehead, Amber Brown, Chelsea Chung, Breanna K. Luton, Nicolas Diaz, Matt McPartlon, Bailey Knight, Macey Radach, Katherine Bateman, David A. Spencer, Jovan Cejovic, Gaelin Kopec-Belliveau, Robel Haile, Edriss Yassine, Cailen McCloskey, Monica Natividad, Dalton Chapman, Luka Stojanovic, Goran Rakocevic, Gregory Hannum, Engin Yapici, Katherine Moran, Rodante Caguiat, Shaheed Abdulhaqq, Zheyuan Guo, Lillian R. Klug, Miles Gander, Joshua Meier.
Preprint, January 2023.
[10.1101/2023.01.08.523187]

De novo design of high-affinity protein binders to bioactive helical peptides.
Susana Vázquez Torres, Philip J. Y. Leung, Isaac D. Lutz, Preetham Venkatesh, Joseph L. Watson, Fabian Hink, Huu-Hien Huynh, Andy Hsien-Wei Yeh, David Juergens, Nathaniel R. Bennett, Andrew N. Hoofnagle, Eric Huang, Michael J MacCoss, Marc Expòsit, Gyu Rie Lee, Paul M. Levine, Xinting Li, Mila Lamb, Elif Nihal Korkmaz, Jeff Nivala, Lance Stewart, Joseph M. Rogers, David Baker.
Preprint, December 2022.
[10.1101/2022.12.10.519862]

Deep learning-enabled design of synthetic orthologs of a signaling protein.
Xinran Lian, Niksa Praljak, Subu K. Subramanian, Sarah Wasinger, Rama Ranganathan, Andrew L. Ferguson.
Preprint, December 2022.
[10.1101/2022.12.21.521443]

A high-level programming language for generative protein design.
Brian Hie, Salvatore Candido, Zeming Lin, Ori Kabeli, Roshan Rao, Nikita Smetanin, Tom Sercu, Alexander Rives.
Preprint, December 2022.
[10.1101/2022.12.21.521526]

Language models generalize beyond natural proteins.
Robert Verkuil, Ori Kabeli, Yilun Du, Basile I. M. Wicky, Lukas F. Milles, Justas Dauparas, David Baker, Sergey Ovchinnikov, Tom Sercu, Alexander Rives.
Preprint, December 2022.
[10.1101/2022.12.21.521521]

Deep Generative Design of Epitope-Specific Binding Proteins by Latent Conformation Optimization.
Raphael R. Eguchi, Christian A. Choe, Udit Parekh, Irene S. Khalek, Michael D. Ward, Neha Vithani, Gregory R. Bowman, Joseph G. Jardine, Po-Ssu Huang.
Preprint, December 2022.
[10.1101/2022.12.22.521698]

Illuminating protein space with a programmable generative model.
John Ingraham, Max Baranov, Zak Costello, Vincent Frappier, Ahmed Ismail, Shan Tie, Wujie Wang, Vincent Xue, Fritz Obermeyer, Andrew Beam, Gevorg Grigoryan.
Preprint, December 2022.
[10.1101/2022.12.01.518682]

Broadly applicable and accurate protein design by integrating structure prediction networks and diffusion generative models.
Joseph L. Watson, David Juergens, Nathaniel R. Bennett, Brian L. Trippe, Jason Yim, Helen E. Eisenach, Woody Ahern, Andrew J. Borst, Robert J. Ragotte, Lukas F. Milles, Basile I. M. Wicky, Nikita Hanikel, Samuel J. Pellock, Alexis Courbet, William Sheffler, Jue Wang, Preetham Venkatesh, Isaac Sappington, Susana Vázquez Torres, Anna Lauko, Valentin De Bortoli, Emile Mathieu, Regina Barzilay, Tommi S. Jaakkola, Frank DiMaio, Minkyung Baek, David Baker.
Preprint, December 2022.
[10.1101/2022.12.09.519842]

De novo PROTAC design using graph-based deep generative models.
Divya Nori, Connor W. Coley, Rocío Mercado.
Preprint, November 2022.
[arxiv]

Latent Space Diffusion Models of Cryo-EM Structures.
Karsten Kreis, Tim Dockhorn, Zihao Li, Ellen Zhong.
Preprint, November 2022.
[arxiv]

Protein Sequence and Structure Co-Design with Equivariant Translation.
Chence Shi, Chuanrui Wang, Jiarui Lu, Bozitao Zhong, Jian Tang.
Preprint, October 2022. [arxiv]

Protein structure generation via folding diffusion.
Kevin E. Wu, Kevin K. Yang, Rianne van den Berg, James Y. Zou, Alex X. Lu, Ava P. Amini.
Preprint, September 2022.
[arxiv]

Deep Generative Models of Protein Structure Uncover Distant Relationships Across a Continuous Fold Space.
Eli J. Draizen, Stella Veretnik, Cameron Mura, Philip E. Bourne.
Preprint, August 2022.
[10.1101/2022.07.29.501943]

Neural Network-Derived Potts Models for Structure-Based Protein Design using Backbone Atomic Coordinates and Tertiary Motifs.
Alex J. Li, Mindren Lu, Israel Desta, Vikram Sundar, Gevorg Grigoryan, Amy E. Keating.
Preprint, August 2022.
[10.1101/2022.08.02.501736]

ProtGPT2 is a deep unsupervised language model for protein design.
Noelia Ferruz, Steffen Schmidt & Birte Höcker.
Nature Communications, July 2022.. [10.1038/s41467-022-32007-7]

ProteinSGM: Score-based generative modeling for de novo protein design.
Jin Sub Lee, Philip M. Kim.
Preprint, July 2022.
[10.1101/2022.07.13.499967]

Antigen-Specific Antibody Design and Optimization with Diffusion-Based Generative Models. Shitong Luo, Yufeng Su, Xingang Peng, Sheng Wang, Jian Peng, Jianzhu Ma.
Preprint, July 2022.
[10.1101/2022.07.10.499510]

End-to-End deep structure generative model for protein design.
Boqiao Lai, Matthew McPartlon, Jinbo Xu.
Preprint, July 2022.
[10.1101/2022.07.09.499440]

Predicting the antigenic evolution of SARS-COV-2 with deep learning.
Wenkai Han, Ningning Chen, Shiwei Sun, Xin Gao.
Preprint, June 2022.
[10.1101/2022.06.23.497375]

Hallucinating protein assemblies.
B. I. M. Wicky, L. F. Milles, A. Courbet, R. J. Ragotte, J. Dauparas, E. Kinfu, S. Tipps, R. D. Kibler, M. Baek, F. DiMaio, X. Li, L. Carter, A. Kang, H. Nguyen, A. K. Bera, D. Baker.
Preprint, June 2022.
[10.1101/2022.06.09.493773]

ProGen2: Exploring the Boundaries of Protein Language Models.
Erik Nijkamp, Jeffrey Ruffolo, Eli N. Weinstein, Nikhil Naik, Ali Madani.
Preprint, June 2022.
[arxiv]

DiffMD: A Geometric Diffusion Model for Molecular Dynamics Simulations.
Fang Wu, Stan Z. Li.
Preprint, April 2022.
[arxiv]

Fragment-Based Ligand Generation Guided By Geometric Deep Learning On Protein-Ligand Structure.
Alexander S. Powers, Helen H. Yu, Patricia Suriana, Ron O. Dror.
Preprint, March 2022.
[10.1101/2022.03.17.484653]

Design in the DARK: Learning Deep Generative Models for De Novo Protein Design.
Lewis Moffat, Shaun M. Kandathil, David T. Jones.
Preprint, January 2022.
[10.1101/2022.01.27.478087]

Sampling the conformational landscapes of transporters and receptors with AlphaFold2.
Diego del Alamo, Davide Sala, Hassane S. Mchaourab, Jens Meiler.
Preprint, November 2021.
[10.1101/2021.11.22.469536]

Benchmarking deep generative models for diverse antibody sequence design.
Igor Melnyk, Payel Das, Vijil Chenthamarakshan, Aurelie Lozano.
Preprint, November 2021.
[arxiv]

Efficient generative modeling of protein sequences using simple autoregressive models.
Jeanne Trinquier, Guido Uguzzoni, Andrea Pagnani, Francesco Zamponi & Martin Weigt.
Nature Communications, October 2021.
[10.1038/s41467-021-25756-4]

Navigating the amino acid sequence space between functional proteins using a deep learning framework.
Tristan Bitard-Feildel​.
PeerJ Computer Science, September 2021.
[10.7717/peerj-cs.684]

BioPhi: A platform for antibody design, humanization and humanness evaluation based on natural antibody repertoires and deep learning.
David Prihoda, Jad Maamary, Andrew Waight, Veronica Juan, Laurence Fayadat-Dilman, Daniel Svozil, Danny A. Bitton.
Preprint, August 2021.
[10.1101/2021.08.08.455394]

Ancestral Sequence Reconstruction for Co-evolutionary models.
Edwin Rodríguez Horta, Alejandro Lage-Castellanos, Roberto Mulet.
Preprint, August 2021.. [arxiv]

AMaLa: Analysis of Directed Evolution Experiments via Annealed Mutational approximated Landscape.
Luca Sesta, Guido Uguzzoni, Jorge Fernandez-de-Cossio Diaz, Andrea Pagnani.
International Journal of Molecular Sciences, August 2021.
[10.3390/ijms222010908]

Modeling sequence-space exploration and emergence of epistatic signals in protein evolution.
Matteo Bisardi, Juan Rodriguez-Rivas, Francesco Zamponi, Martin Weigt.
Preprint, June 2021.
[arxiv]

Generative AAV capsid diversification by latent interpolation.
Sam Sinai, Nina Jain, George M Church, Eric D Kelsic.
Preprint, April 2021.
[10.1101/2021.04.16.440236]

Protein design and variant prediction using autoregressive generative models.
Jung-Eun Shin, Adam Riesselman, Kollasch, Conor McMahon, Elana Simon, Chris Sander, Aashish Manglik, Andrew Kruse, Debora Marks.
Nature Communications, April 2021.
[10.1038/s41467-021-22732-w]

Expanding functional protein sequence spaces using generative adversarial networks.
Donatas Repecka, Vykintas Jauniskis, Laurynas Karpus, Elzbieta Rembeza, Jan Zrimec, Simona Poviloniene, Irmantas Rokaitis, Audrius Laurynenas, Wissam Abuajwa, Otto Savolainen, Rolandas Meskys, Martin K. M. Engqvist, Aleksej Zelezniak.
Nature Machine Intelligence, March 2021.
[10.1038/s42256-021-00310-5]

Generating functional protein variants with variational autoencoders.
Alex Hawkins-Hooker, Florence Depardieu, Sebastien Baur, Guillaume Couairon, Arthur Chen, David Bikard.
PLOS Computational Biology, February 2021.
[10.1371/journal.pcbi.1008736]

Generating novel protein sequences using Gibbs sampling of masked language models.
Sean R. Johnson, Sarah Monaco, Kenneth Massie, Zaid Syed.
Preprint, January 2021.
[10.1101/2021.01.26.428322]

The structure-fitness landscape of pairwise relations in generative sequence models.
Preprint, November 2020.
Dylan Marshall, Haobo Wang, Michael Stiffler, Justas Dauparas, Peter Koo, Sergey Ovchinnikov.
[10.1101/2020.11.29.402875]

De Novo Protein Design for Novel Folds Using Guided Conditional Wasserstein Generative Adversarial Networks.
Mostafa Karimi, Shaowen Zhu, Yue Cao, Yang Shen.
Journal of Chemical Information and Modeling, September 2020.
[10.1021/acs.jcim.0c00593]

Deep learning enables the design of functional de novo antimicrobial proteins.
Javier Caceres-Delpiano, Roberto Ibañez, Patricio Alegre, Cynthia Sanhueza, Romualdo Paz-Fiblas, Simon Correa, Pedro Retamal, Juan Cristóbal Jiménez, Leonardo Álvarez.
Preprint, August 2020.
[10.1101/2020.08.26.266940]

Generative probabilistic biological sequence models that account for mutational variability.
Eli N. Weinstein, Debora S. Marks.
Preprint, August 2020.
[10.1101/2020.07.31.231381]

IG-VAE: Generative Modeling of Immunoglobulin Proteins by Direct 3D Coordinate Generation.
Raphael R. Eguchi, Namrata Anand, Christian A. Choe, Po-Ssu Huang.
Preprint, August 2020.
[10.1101/2020.08.07.242347]

A Generative Neural Network for Maximizing Fitness and Diversity of Synthetic DNA and Protein Sequences. Johannes Linder, Nicholas Bogard, Alexander B. Rosenberg, Georg Seelig Cell Systems, July 2020 [10.1016/j.cels.2020.05.007]

Signal Peptides Generated by Attention-Based Neural Networks.
Zachary Wu, Kevin Kaichuang Yang, Michael Liszka, Alycia Lee, Alina Batzilla, David Wernick, David P Weiner, Frances H Arnold.
ACS Synthetic Biology, July 2020.
[10.1021/acssynbio.0c00219]

Bio-informed Protein Sequence Generation for Multi-class Virus Mutation Prediction.
Yuyang Wang, Prakarsh Yadav, Rishikesh Magar, Amir Barati Farimani.
Preprint, June 2020.
[10.1101/2020.06.11.146167]

Designing Feature-Controlled Humanoid Antibody Discovery Libraries Using Generative Adversarial Networks.
Tileli Amimeur, Jeremy M. Shaver, Randal R. Ketchem, J. Alex Taylor, Rutilio H. Clark, Josh Smith, Danielle Van Citters, Christine C. Siska, Pauline Smidt, Megan Sprague, Bruce A. Kerwin, Dean Pettit. Preprint, April 2020. [10.1101/2020.04.12.024844]

ProGen: Language Modeling for Protein Generation.
Ali Madani, Bryan McCann, Nikhil Naik, Nitish Shirish Keskar, Namrata Anand, Raphael R. Eguchi, Po-Ssu Huang, Richard Socher.
Preprint, March 2020.
[10.1101/2020.03.07.982272]

De Novo Protein Design for Novel Folds using Guided Conditional Wasserstein Generative Adversarial Networks (gcWGAN).
Mostafa Karimi, Shaowen Zhu, Yue Cao, Yang Shen.
Preprint, September 2019.
[10.1101/769919]

Reconstructing continuous distributions of 3D protein structure from cryo-EM images.
Ellen D. Zhong, Tristan Bepler, Joseph H. Davis, Bonnie Berger.
Preprint, September 2019. [arXiv]

Deep generative models for T cell receptor protein sequences.
Kristian Davidsen, Branden J. Olson, William S. DeWitt III, Jean Feng, Elias Harkins, Philip Bradley, Frederick A. Matsen IV.
eLife, September 2019.
[10.7554/eLife.46935.001]

Generative Models for Graph-Based Protein Design.
John Ingraham, Vikas K. Garg, Regina Barzilay, Tommi Jaakkola.
ICLR workshop on Deep Generative Models for Highly Structured Data, May 2019.
[OpenReview]

How to Hallucinate Functional Proteins.
Zak Costello, Hector Garcia Martin
Preprint, March 2019
[arxiv]

Conditioning by adaptive sampling for robust design.
David H. Brookes, Hahnbeom Park, Jennifer Listgarten.
Preprint, January 2019.
[arxiv]

Generative modeling for protein structures.
Namrata Anand, Po-Ssu Huang.
NeurIPS, December 2018.
[NeurIPS]

Design of metalloproteins and novel protein folds using variational autoencoders.
Joe G. Greener, Lewis Moffat, David T Jones.
Scientific Reports, November 2018.
[10.1038/s41598-018-34533-1]

Design by adaptive sampling.
David H. Brookes, Jennifer Listgarten.
Preprint, October 2018.
[arxiv]

Deep generative models of genetic variation capture the effects of mutations.
Adam J Riesselman, John B Ingraham, Debora S. Marks
Nature Methods, September 2018
[10.1038/s41592-018-0138-4]

Feedback GAN (FBGAN) for DNA: a Novel Feedback-Loop Architecture for Optimizing Protein Functions.
Anvita Gupta, James Zou.
Preprint, April 2018.
[arxiv]

Recurrent Neural Network Model for Constructive Peptide Design.
Alex T. Müller, Jan A. Hiss, and Gisbert Schneider.
Journal of Chemical Information and Modeling, January 2018
[10.1021/acs.jcim.7b00414]

Variational auto-encoding of protein sequences.
Sam Sinai, Eric Kelsic, George M. Church, Martin A. Nowak
Preprint, December 2017
[arxiv]

Biophysics

ForceGen: End-to-end de novo protein generation based on nonlinear mechanical unfolding responses using a protein language diffusion model.
Bo Ni, David L. Kaplan, Markus J. Buehler.
Preprint, October 2023.
[arxiv]

Chemically Transferable Generative Backmapping of Coarse-Grained Proteins.
Soojung Yang, Rafael Gómez-Bombarelli.
Preprint, March 2023.
[arxiv]

Direct generation of protein conformational ensembles via machine learning.
Giacomo Janson, Gilberto Valdes-Garcia, Lim Heo & Michael Feig.
Nature Communications, February 2023.
[10.1038/s41467-023-36443-x]Matching receptor to odorant with protein language and graph neural networks

Machine Learning Coarse-Grained Potentials of Protein Thermodynamics.
Maciej Majewski, Adrià Pérez, Philipp Thölke, Stefan Doerr, Nicholas E. Charron, Toni Giorgino, Brooke E. Husic, Cecilia Clementi, Frank Noé, Gianni De Fabritiis.
Preprint, December 2022.
[arxiv]

Skipping the Replica Exchange Ladder with Normalizing Flows.
Michele Invernizzi, Andreas Krämer, Cecilia Clementi, Frank Noé.
Preprint, October 2022.
[arxiv]

From data to noise to data for mixing physics across temperatures with generative artificial intelligence.
Yihang Wang, Lukas Herron, and Pratyush Tiwary.
PNAS, August 2022.
[10.1073/pnas.2203656119]

Molecular dynamics without molecules: searching the conformational space of proteins with generative neural networks.
Gregory Schwing, Luigi L. Palese, Ariel Fernández, Loren Schwiebert, Domenico L. Gatti.
Preprint, June 2022.
[arxiv]

Predicting stability

The genetic architecture of protein stability.
Andre J. Faure, Aina Martí-Aranda, Cristina Hidalgo-Carcedo, Jörn M. Schmiedel, Ben Lehner.
Preprint, October 2023.
[10.1101/2023.10.27.564339]

New mega dataset combined with deep neural network makes a progress in predicting impact of mutation on protein stability.
Marina A Pak, Nikita V Dovidchenko, Satyarth Mishra Sharma, Dmitry N Ivankov.
Preprint, January 2023.
[10.1101/2022.12.31.522396]

PROSTATA: Protein Stability Assessment using Transformers.
Dmitriy Umerenkov, Tatiana I. Shashkova, Pavel V. Strashnov, Fedor Nikolaev, Maria Sindeeva, Nikita V. Ivanisenko, Olga L. Kardymon.
Preprint, December 2022.
[10.1101/2022.12.25.521875]

Rapid protein stability prediction using deep learning representations.
Lasse M. Blaabjerg, Maher M. Kassem, Lydia L. Good, Nicolas Jonsson, Matteo Cagiada, Kristoffer E. Johansson, Wouter Boomsma, Amelie Stein, Kresten Lindorff-Larsen.
Preprint, August 2022.
[10.1101/2022.07.14.500157]

Artificial Neural Network to Predict Structure-based Protein-protein Free Energy of Binding from Rosetta-calculated Properties.
Matheus Ferraz, José Neto, Roberto Lins, Erico Teixeira.
Preprint, August 2022.
[10.26434/chemrxiv-2022-zhd87]

Construction of a Deep Neural Network Energy Function for Protein Physics.
Huan Yang, Zhaoping Xiong, Francesco Zonta.
J. Chem. Theory Comput., August 2008.
[10.1021/acs.jctc.2c00069]

Towards generalizable prediction of antibody thermostability using machine learning on sequence and structure features.
Ameya Harmalkar, Roshan Rao, Jonas Honer, Wibke Deisting, Jonas Anlahr, Anja Hoenig, Julia Czwikla, Eva Sienz-Widmann, Doris Rau, Austin Rice, Timothy P. Riley, Danqing Li, Hannah B. Catterall, Christine E. Tinberg, Jeffrey J. Gray, Kathy Y. Wei.
Preprint, June 2022.
[10.1101/2022.06.03.494724]

Learning deep representations of enzyme thermal adaptation.
Gang Li, Filip Buric, Jan Zrimec, Sandra Viknander, Jens Nielsen, Aleksej Zelezniak, Martin K. M. Engqvist.
Preprint, March 2022.
[10.1101/2022.03.14.484272]

Evaluating Protein Engineering Thermostability Prediction Tools Using an Independently Generated Dataset. Peishan Huang, Simon K. S. Chu, Henrique N. Frizzo, Morgan P. Connolly, Ryan W. Caster, and Justin B. Siegel.
ACS Omega, March 2020.
[10.1021/acsomega.9b04105]

Predicting changes in protein thermostability upon point mutation with deep 3D convolutional neural networks.
Bian Li, Yucheng T. Yang, John A. Capra, Mark B. Gerstein.
Preprint, February 2020.
[10.1101/2020.02.28.959874]

Machine Learning for Prioritization of Thermostabilizing Mutations for G-protein Coupled Receptors.
S. Muk, S. Ghosh, S. Achuthan, X. Chen, X. Yao, M. Sandhu, M. C. Griffor, K. F. Fennell, Y. Che, V. Shanmugasundaram, X. Qiu, C. G. Tate, N. Vaidehi.
Preprint, July 2019.
[10.1101/715375]

Machine Learning Applied to Predicting Microorganism Growth Temperatures and Enzyme Catalytic Optima Gang Li, Kersten S. Rabe, Jens Nielsen, Martin K. M. Engqvist.
ACS Synthetic Biology, May 2019
[10.1021/acssynbio.9b00099]

mGPfusion: predicting protein stability changes with Gaussian process kernel learning and data fusion. Emmi Jokinen, Markus Heinonen, Harri Lähdesmäki.
Bioinformatics, July 2018.
[10.1093/bioinformatics/bty238]

Structure Based Thermostability Prediction Models for Protein Single Point Mutations with Machine Learning Tools.
Lei Jia , Ramya Yarlagadda, Charles C. Reed.
PLOS One, September 2015.
[10.1371/journal.pone.0138022]

NeEMO: a method using residue interaction networks to improve prediction of protein stability upon mutation.
Manuel Giollo, Alberto J. M. Martin†, Ian Walsh, Carlo Ferrari, Silvio C. E. Tosatto.
BMC Genomics, May 2014.
[10.1186/1471-2164-15-S4-S7]

mCSM: predicting the effects of mutations in proteins using graph-based signatures.
Douglas E. V. Pires, David B. Ascher, Tom L. Blundell.
Bioinformatics, February 2014.
[10.1093/bioinformatics/btt691]

PROTS-RF: A Robust Model for Predicting Mutation-Induced Protein Stability Changes.
Yunqi Li, Jianwen Fang.
PLOS One, October 2012.
[10.1371/journal.pone.0047247]

Predicting changes in protein thermostability brought about by single- or multi-site mutations.
Jian Tian, Ningfeng Wu, Xiaoyu Chu, Yunliu Fan.
BMC Bioinformatics, July 2010.
[10.1186/1471-2105-11-370]

Fast and accurate predictions of protein stability changes upon mutations using statistical potentials and neural networks: PoPMuSiC-2.0.
Yves Dehouck, Aline Grosfils, Benjamin Folch, Dimitri Gilis, Philippe Bogaerts, Marianne Rooman.
Bioinformatics, October 2009.
[10.1093/bioinformatics/btp445]

Prediction of protein stability changes for single‐site mutations using support vector machines.
Jianlin Cheng, Arlo Randall, Pierre Baldi.
Proteins, December 2005.
[10.1002/prot.20810]

Predicting protein stability changes from sequences using support vector machines.
Emidio Capriotti, Piero Fariselli, Remo Calabrese, Rita Casadio.
Bioinformatics, September 2005.
[10.1093/bioinformatics/bti1109]

I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structure.
Emidio Capriotti, Piero Fariselli, Rita Casadio.
Nucleic Acids Research, July 2005.
[10.1093/nar/gki375]

A neural-network-based method for predicting protein stability changes upon single point mutations.
Emidio Capriotti, Piero Fariselli, Rita Casadio.
Bioinformatics, August 2004.
[10.1093/bioinformatics/bth928]

Mismatch string kernels for discriminative protein classification.
Christina S. Leslie, Eleazar Eskin, Adiel Cohen, Jason Weston, William Stafford Noble.
Bioinformatics, March 2004.
[10.1093/bioinformatics/btg431]

Predicting structure from sequence

Generalized Biomolecular Modeling and Design with RoseTTAFold All-Atom.
Rohith Krishna, Jue Wang, Woody Ahern, Pascal Sturmfels, Preetham Venkatesh, Indrek Kalvet, Gyu Rie Lee, Felix S Morey-Burrows, Ivan Anishchenko, Ian R Humphreys, Ryan McHugh, Dionne Vafeados, Xinting Li, George A Sutherland, Andrew Hitchcock, C Neil Hunter, Minkyung Baek, Frank DiMaio, David Baker.
Preprint, October 2023.
[10.1101/2023.10.09.561603]

Accurate Prediction of Antibody Function and Structure Using Bio-Inspired Antibody Language Model.
Hongtai Jing, Zhengtao Gao, Sheng Xu, Tao Shen, Zhangzhi Peng, Shwai He, Tao You, Shuang Ye, Wei Lin, Siqi Sun.
Preprint, August 2023.
[arxiv]

Systematic identification of conditionally folded intrinsically disordered regions by AlphaFold2.
T. Reid Alderson, Iva Pritišanac, Đesika Kolarić, and Julie D. Forman-Kay.
PNAS, August 2023.
[10.1073/pnas.2304302120]

Highfold: accurately predicting cyclic peptide monomers and complexes with AlphaFold.
Chenhao Zhang, Chengyun Zhang, Tianfeng Shang, Xinyi Wu, Hongliang Duan.
Preprint, August 2023.
[10.1101/2023.08.27.554979]

Enhancing the Protein Tertiary Structure Prediction by Multiple Sequence Alignment Generation.
Le Zhang, Jiayang Chen, Tao Shen, Yu Li, Siqi Sun.
Preprint, June 2023.
[arxiv]

Enhancing the Protein Tertiary Structure Prediction by Multiple Sequence Alignment Generation.
Le Zhang, Jiayang Chen, Tao Shen, Yu Li, Siqi Sun.
Preprint, June 2023.
[arxiv]

Efficient and accurate prediction of protein structure using RoseTTAFold2.
Minkyung Baek, Ivan Anishchenko, Ian R. Humphreys, Qian Cong, David Baker, Frank DiMaio.
Preprint, May 2023.
[10.1101/2023.05.24.542179]

EigenFold: Generative Protein Structure Prediction with Diffusion Models.
Bowen Jing, Ezra Erives, Peter Pao-Huang, Gabriele Corso, Bonnie Berger, Tommi Jaakkola.
Preprint, April 2023.
[arxiv]

Evolutionary-scale prediction of atomic-level protein structure with a language model.
Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Nikita Smetanin, Robert Verkuil, Ori Kabeli, Yaniv Shmueli, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Salvatore Candido, Alexander Rives.
Science, March 2023.
[10.1126/science.ade2574]

DR-BERT: A Protein Language Model to Annotate Disordered Regions.
Ananthan Nambiar, John Malcolm Forsyth, Simon Liu, Sergei Maslov.
Preprint, Feb 2023.
[10.1101/2023.02.22.529574]

AlphaFold Prediction of Structural Ensembles of Disordered Proteins.
Z. Faidon Brotzakis, Shengyu Zhang, Michele Vendruscolo.
Preprint, January 2023.
[10.1101/2023.01.19.524720]

AFsample: Improving Multimer Prediction with AlphaFold using Aggressive Sampling.
Björn Wallner.
Preprint, December 2022.
[10.1101/2022.12.20.521205]

OpenFold: Retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization.
Gustaf Ahdritz, Nazim Bouatta, Sachin Kadyan, Qinghui Xia, William Gerecke, Timothy J O’Donnell, Daniel Berenberg, Ian Fisk, Niccolò Zanichelli, Bo Zhang, Arkadiusz Nowaczynski, Bei Wang, Marta M Stepniewska-Dziubinska, Shang Zhang, Adegoke Ojewole, Murat Efe Guney, Stella Biderman, Andrew M Watkins, Stephen Ra, Pablo Ribalta Lorenzo, Lucas Nivon, Brian Weitzner, Yih-En Andrew Ban, Peter K Sorger, Emad Mostaque, Zhao Zhang, Richard Bonneau, Mohammed AlQuraishi.
Preprint, November 2022.
[10.1101/2022.11.20.517210]

Improved the Protein Complex Prediction with Protein Language Models.
Bo Chen, Ziwei Xie, Jiezhong Qiu, Zhaofeng Ye, Jinbo Xu, Jie Tang.
Preprint, November 2022.
[10.1101/2022.09.15.508065]

tFold-Ab: Fast and Accurate Antibody Structure Prediction without Sequence Homologs.
Jiaxiang Wu, Fandi Wu, Biaobin Jiang, Wei Liu, Peilin Zhao.
Preprint, November 2022.
[10.1101/2022.11.10.515918]

Ultra-fast protein structure prediction to capture effects of sequence variation in mutation movies.
Konstantin Weissenow, Michael Heinzinger, Martin Steinegger, Burkhard Rost.
Preprint, November 2022.
[10.1101/2022.11.14.516473]

Improving protein secondary structure prediction by deep language models and transformer networks.
Tianqi Wu, Weihang Cheng, Jianlin Cheng.
Preprint, November 2022.
[10.1101/2022.11.21.517442]

Fast and accurate Ab Initio Protein structure prediction using deep learning potentials.
Robin Pearce,Yang Li,Gilbert S. Omenn,Yang Zhang.
PLoS Computational Biology, September 2022.
[10.1371/journal.pcbi.1010539]

Accurate prediction of nucleic acid and protein-nucleic acid complexes using RoseTTAFoldNA.
Minkyung Baek, Ryan McHugh, Ivan Anishchenko, David Baker, Frank DiMaio.
Preprint, September 2022.
[10.1101/2022.09.09.507333]

Hallucination of closed repeat proteins containing central pockets.
Linna An, Derrick R Hicks, Dmitri Zorine, Justas Dauparas, Basile I. M. Wicky, Lukas F. Milles, Alexis Courbet, Asim K. Bera, Hannah Nguyen, Alex Kang, Lauren Carter, David Baker.
Preprint, September 2022.
[10.1101/2022.09.01.506251]

SE(3) Equivalent Graph Attention Network as an Energy-Based Model for Protein Side Chain Conformation.
Deqin Liu, Sheng Chen, Shuangjia Zheng, Sen Zhang, Yuedong Yang.
Preprint, September 2022. [10.1101/2022.09.05.506704]

SPEACH_AF: Sampling protein ensembles and conformational heterogeneity with Alphafold2.
Richard A. Stein ,Hassane S. Mchaourab.
PLoS Computational Biology, August 2022.
[10.1371/journal.pcbi.1010483]

Few-Shot Learning of Accurate Folding Landscape for Protein Structure Prediction.
Jun Zhang, Sirui Liu, Mengyun Chen, Haotian Chu, Min Wang, Zidong Wang, Jialiang Yu, Ningxi Ni, Fan Yu, Diqing Chen, Yi Isaac Yang, Boxin Xue, Lijiang Yang, Yuan Liu, Yi Qin Gao.
Preprint, August 2022.
[arxiv]

Atomic protein structure refinement using all-atom graph representations and SE(3)-equivariant graph neural networks.
Tianqi Wu, Jianlin Cheng.
Preprint, August 2022.
[10.1101/2022.05.06.490934]

HelixFold-Single: MSA-free Protein Structure Prediction by Using Protein Language Model as an Alternative.. Xiaomin Fang, Fan Wang, Lihang Liu, Jingzhou He, Dayong Lin, Yingfei Xiang, Xiaonan Zhang, Hua Wu, Hui Li, Le Song.
Preprint, August 2022.
[[arxiv]https://arxiv.org/abs/2207.13921]

NetSurfP-3.0: accurate and fast prediction of protein structural features by protein language models and deep learning.
Magnus Haraldson Høie, Erik Nicolas Kiehl, Bent Petersen, Morten Nielsen, Ole Winther, Henrik Nielsen, Jeppe Hallgren, Paolo Marcatili.
Nucleic Acids Research, July 2022.
[10.1093/nar/gkac439]

High-resolution de novo structure prediction from primary sequence.
Ruidong Wu, Fan Ding, Rui Wang, Rui Shen, Xiwen Zhang, Shitong Luo, Chenpeng Su, Zuofan Wu, Qi Xie, Bonnie Berger, Jianzhu Ma, Jian Peng.
Preprint, July 2022.
[10.1101/2022.07.21.500999]

Protein Structure Prediction with Expectation Reflection.
Evan Cresswell-Clay, Danh-Tai Hoang, Joe McKenna, Chris Yang, Eric Zhang, Vipul Periwal.
Preprint, July 2022.
[10.1101/2022.07.12.499755]

PSP: Million-level Protein Sequence Dataset for Protein Structure Prediction.
Sirui Liu, Jun Zhang, Haotian Chu, Min Wang, Boxin Xue, Ningxi Ni, Jialiang Yu, Yuhao Xie, Zhenyu Chen, Mengyun Chen, Yuan Liu, Piya Patra, Fan Xu, Jie Chen, Zidong Wang, Lijiang Yang, Fan Yu, Lei Chen, Yi Qin Gao.
Preprint, June 2022.
[arxiv]

Accurate prediction of inter-protein residue–residue contacts for homo-oligomeric protein complexes.
Yumeng Yan, Sheng-You Huang.
Briefings in Bioinformatics, September 2021.
[10.1093/bib/bbab038]

Improved prediction of protein-protein interactions using AlphaFold2 and extended multiple-sequence alignments.
P. Bryant, G. Pozzati, A. Elofsson.
Preprint, September 2021.
[10.1101/2021.09.15.460468]

Accurate prediction of protein structures and interactions using a three-track neural network.
MINKYUNG BAEK... DAVID BAKER.
Science, August 2021.
[10.1126/science.abj8754]

Distillation of MSA Embeddings to Folded Protein Structures with Graph Transformers.
Allan Costa, Manvitha Ponnapati, Joseph M. Jacobson, Pranam Chatterjee.
Preprint, June 2021.
[10.1101/2021.06.02.446809]

Deducing high-accuracy protein contact-maps from a triplet of coevolutionary matrices through deep residual convolutional networks.
Yang Li, Chengxin Zhang, Eric W. Bell,Wei Zheng, Xiaogen Zhou, Dong-Jun Yu, Yang Zhang.
PLOS Computational Biology, March 2021.
[10.1371/journal.pcbi.1008865]

Multi-task deep learning for concurrent prediction of protein structural properties.
Buzhong Zhang, Jinyan Li, Lijun Quan, Qiang Lyu.
Preprint, February 2021.
[10.1101/2021.02.04.429840]

A multi-task deep-learning system for predicting membrane associations and secondary structures of proteins.
Bian Li, Jeffrey Mendenhall, John Anthony Capra, Jens Meiler. Preprint, December 2020.
[10.1101/2020.12.02.409045]

Single Layers of Attention Suffice to Predict Protein Contacts.
Nicholas Bhattacharya, Neil Thomas, Roshan Rao, Justas Dauparas, Peter K. Koo, David Baker, Yun S. Song, Sergey Ovchinnikov.
Preprint, December 2020.
[10.1101/2020.12.21.423882]

Fast and effective protein model refinement by deep graph neural networks.
Xiaoyang Jing, Jinbo Xu.
Preprint, December 2020.
[10.1101/2020.12.10.419994]

Protein Structural Alignments From Sequence.
James T. Morton, Charlie E. M. Strauss, Robert Blackwell, Daniel Berenberg, Vladimir Gligorijevic, Richard Bonneau.
Preprint, November 2020.
[10.1101/2020.11.03.365932]

Deep learning-based prediction of protein structure using learned representations of multiple sequence alignments.
Shaun M Kandathil, Joe G Greener, Andy M Lau, David T Jones.
Preprint, November 2020.
[10.1101/2020.11.27.401232]

Study of Real-Valued Distance Prediction For Protein Structure Prediction with Deep Learning.
Jin Li, Jinbo Xu.
Preprint, November 2020.
[10.1101/2020.11.26.400523]

REALDIST: Real-valued protein distance prediction.
Badri Adhikari.
Preprint, November 2020.
[10.1101/2020.11.28.402214]

Deep learning-based prediction of protein structure using learned representations of multiple sequence alignments.
Shaun M Kandathil, Joe G Greener, Andy M Lau, David T Jones.
Preprint, November 2020.
[10.1101/2020.11.27.401232]

Combination of deep neural network with attention mechanism enhances the explainability of protein contact prediction.
Chen Chen, Tianqi Wu, Zhiye Guo, Jianlin Cheng.
Preprint, September 2020.
[10.1101/2020.09.04.283937]

Phylogenetic correlations have limited effect on coevolution-based contact prediction in proteins.
Edwin Rodriguez Horta, Martin Weigt.
Preprint, August 2020.
[10.1101/2020.08.12.247577]

Near-complete protein structural modelling of the minimal genome.
Joe G Greener, Nikita Desai, Shaun M Kandathil, David T Jones.
Preprint, July 2020.
[arxiv]

Template-based prediction of protein structure with deep learning.
Haicang Zhang, Yufeng Shen.
Preprint, June 2020.
[2020.06.02.129270]

Energy-based models for atomic-resolution protein conformations.
Yilun Du, Joshua Meier, Jerry Ma, Rob Fergus, Alexander Rives.
ICLR, April 2020.
[arXiv]

A fully open-source framework for deep learning protein real-valued distances.
Badri Adhikari.
Preprint, April 2020.
[10.1101/2020.04.26.061820]

PhANNs, a fast and accurate tool and web server to classify phage structural proteins.
Victor Seguritan, Jackson Redfield, David Salamon, Robert A. Edwards, Anca M. Segall.
Preprint, April 2020.
[10.1101/2020.04.03.023523]

DeepDist: real-value inter-residue distance prediction with deep residual convolutional network.
Tianqi Wu, Zhiye Guo, Jie Hou, Jianlin Cheng.
Preprint, March 2020.
[10.1101/2020.03.17.995910))]

Improved protein structure prediction using predicted inter-residue orientations.
Jianyi Yang, Ivan Anishchenko, Hahnbeom Park, Zhenling Peng, Sergey Ovchinnikov, David Baker.
PNAS, January 2020.
[10.1073/pnas.1914677117]

Deep learning methods in protein structure prediction.
Mirko Torrisi, Gianluca Pollastri, Quan Lea.
Computational and Structural Biotechnology, January 2020.
[10.1016/j.csbj.2019.12.011]

Improved protein structure prediction using potentials from deep learning.
Andrew W. Senior, Richard Evans, John Jumper, James Kirkpatrick, Laurent Sifre, Tim Green, Chongli Qin, Augustin Žídek, Alexander W. R. Nelson, Alex Bridgland, Hugo Penedones, Stig Petersen, Karen Simonyan, Steve Crossan, Pushmeet Kohli, David T. Jones, David Silver, Koray Kavukcuoglu, Demis Hassabis.
Nature, January 2020.
[10.1038/s41586-019-1923-7]

Deep learning extends de novo protein modelling coverage of genomes using iteratively predicted structural constraints.
Joe G. Greener, Shaun M. Kandathil, David T. Jones.
Nature Communications, September 2019.
[10.1038/s41467-019-11994-0]

DeepPrime2Sec: Deep Learning for Protein Secondary Structure Prediction from the Primary Sequences.
Ehsaneddin Asgari, Nina Poerner, Alice C. McHardy, Mohammad R.K. Mofrad.
Preprint, July 2019.
[10.1101/705426]

End-to-End Differentiable Learning of Protein Structure.
Mohammed AlQuraishi.
Cell Systems, April 2019.
[10.1016/j.cels.2019.03.006]

DESTINI: A deep-learning approach to contact-driven protein structure prediction.
Mu Gao, Hongyi Zhou, Jeffrey Skolnick.
Scientific Reports, March 2019.
[10.1038/s41598-019-40314-1]

Learning protein sequence embeddings using information from structure.
Tristan Bepler, Bonnier Berger.
International Conference on Learning Representations, February 2019.
[ICLR]

Generative modeling for protein structures.
Namrata Anand, Po-Ssu Huang.
NeurIPS, December 2018.
[NeurIPS]

Distance-based Protein Folding Powered by Deep Learning.
Jinbo Xu.
Preprint, November 2018.
[arxiv]

Porter 5: fast, state-of-the-art ab initio prediction of protein secondary structure in 3 and 8 classes.
Mirko Torrisi, Manaz Kaleel, Gianluca Pollastri.
Preprint, October 2018.
[10.1101/289033] [bioRxiv]

Protein Secondary Structure Prediction Based on Data Partition and Semi-Random Subspace Method.
Yuming Ma, Yihui Liu, Jinyong Cheng.
Scientific Reports, June 2018.
[10.1038/s41598-018-28084-8]

Protein Secondary Structure Prediction with Long Short Term Memory Networks.
Søren Kaae Sønderby, Ole Winther. Preprint, December 2014.
[arxiv]

Deep Supervised and Convolutional Generative Stochastic Network for Protein Secondary Structure Prediction.
Jian Zhou, Olga G. Troyanskaya.
Preprint, March 2014.
[arxiv]

Predicting sequence from structure

Context-aware geometric deep learning for protein sequence design.
Lucien F. Krapp, Fernando A. Meireles, Luciano A. Abriata, Matteo Dal Peraro.
Preprint, June 2023.
[10.1101/2023.06.19.545381]

Inverse Protein Folding Using Deep Bayesian Optimization.
Natalie Maus, Yimeng Zeng, Daniel Allen Anderson, Phillip Maffettone, Aaron Solomon, Peyton Greenside, Osbert Bastani, Jacob R. Gardner.
Preprint, March 2023.
[arxiv]

Protein Design Using Physics Informed Neural Networks.
SI Omar, C Keasar, AJ Ben-Sasson, E Haber.
Biomolecules, February 2023. [10.3390/biom13030457]

Efficient and scalable de novo protein design using a relaxed sequence space.
Christopher Frank, Ali Khoshouei, Yosta de Stigter, Dominik Schiewitz, Shihao Feng, Sergey Ovchinnikov, Hendrik Dietz.
Preprint, February 2023.
[10.1101/2023.02.24.529906]

De novo protein design by inversion of the AlphaFold structure prediction network.
Casper Goverde, Benedict Wolf, Hamed Khakzad, Stéphane Rosset, Bruno E. Correia.
Preprint, December 2022.
[10.1101/2022.12.13.520346]

Robust deep learning–based protein sequence design using ProteinMPNN.
J. Dauparas, I. Anishchenko, N. Bennett, H. Bai, R. J. Ragotte, L. F. Milles, B. I. M. Wicky, A. Courbet, R. J. de Haas, N. Bethel, P. J. Y. Leung, T. F. Huddy, S. Pellock, D. Tischer, F. Chan, B. Koepnick, H. Nguyen, A. Kang, B. Sankaran, A. K. Bera, N. P. King, D. Baker.
Science, September 2022.
[10.1126/science.add2187]

PiFold: Toward effective and efficient protein inverse folding.
Zhangyang Gao, Cheng Tan, Stan Z. Li.
Preprint, September 2022.
arxiv

PeTriBERT : Augmenting BERT with tridimensional encoding for inverse protein folding and design.
Baldwin Dumortier, Antoine Liutkus, Clément Carré, Gabriel Krouk.
Preprint, August 2022.
[10.1101/2022.08.10.503344]

SIPF: Sampling Method for Inverse Protein Folding.
Tianfan Fu, Jimeng Sun.
KDD, August 2022.
[10.1145/3534678.3539284]

Rotamer-free protein sequence design based on deep learning and self-consistency.
Yufeng Liu, Lu Zhang, Weilun Wang, Min Zhu, Chenchen Wang, Fudong Li, Jiahai Zhang, Houqiang Li, Quan Chen & Haiyan Liu.
Nature Computational Science, July 2022.
[10.1038/s43588-022-00273-6]

Accurate and efficient protein sequence design through learning concise local environment of residues.
Bin Huang, Tingwen Fan, Kaiyue Wang, Haicang Zhang, Chungong Yu, Shuyu Nie, Yangshuo Qi, Wei-Mou Zheng, Jian Han, Zheng Fan, Shiwei Sun, Sheng Ye, Huaiyi Yang, Dongbo Bu.
Preprint, July 2022.
[10.1101/2022.06.25.497605]

Protein sequence sampling and prediction from structural data.
Gabriel Andres Orellana, Javier Caceres-Delpiano, Roberto Ibañez, Michael P Dunne, Leonardo Álvarez.
Preprint, November 2021.
[10.1101/2021.09.06.459171]

Design of proteins presenting discontinuous functional sites using deep learning.
Doug Tischer, Sidney Lisanza, Jue Wang, Runze Dong, Ivan Anishchenko, Lukas F. Milles, Sergey Ovchinnikov, David Baker. Preprint, November 2020.
[10.1101/2020.11.29.402743]

Learning from Protein Structure with Geometric Vector Perceptrons.
Bowen Jing, Stephan Eismann, Patricia Suriana, Raphael J.L. Townshend, Ron Dror.
Preprint, September 2020.
[arxiv]

Protein Sequence Design with a Learned Potential.
Namrata Anand, Raphael R. Eguchi, Alexander Derry, Russ B. Altman, Po-Ssu Huang.
Preprint, January 2020.
[10.1101/2020.01.06.895466]

Designing real novel proteins using deep graph neural networks.
Alexey Strokach, David Becerra, Carles Corbi, Albert Perez-Riba, Philip M. Kim.
Preprint, December 2019.
[10.1101/868935] [bioRxiv]

ProDCoNN: Protein design using a convolutional neural network.
Yuan Zhang, Yang Chen, Chenran Wang, Chun‐Chao Lo, Xiuwen Liu, Wei Wu, Jinfeng Zhang.
Proteins: Structure, Function, Bioinformatics, December 2019.
[10.1002/prot.25868]

RamaNet: Computational De Novo Protein Design using a Long Short-Term Memory Generative Adversarial Neural Network.
Sari Sabban, Mikhail Markovsky.
Preprint, June 2019.
[10.1101/671552] [bioRxiv]

Generative Models for Graph-Based Protein Design.
John Ingraham, Vikas K. Garg, Regina Barzilay, Tommi Jaakkola.
ICLR workshop on Deep Generative Models for Highly Structured Data, May 2019.
[OpenReview]

SPIN2: Predicting sequence profiles from protein structures using deep neural networks.
James O'Connell, Zhixiu Li, Jack Hansonm, Rhys Heffernan, James Lyons, Kuldip Paliwal, Abdollah Dehzangi, Yuedong Yang, Yaoqi Zhou.
Proteins, March 2018.
[10.1002/prot.25489]

Classification, annotation, search, and alignments

Predicting enzymatic function of protein sequences with attention.
Nicolas Buton, François Coste, Yann Le Cunff.
Bioinformatics, October 2023.
[10.1093/bioinformatics/btad620]

Clustering predicted structures at the scale of the known protein universe.
Inigo Barrio-Hernandez, Jingi Yeo, Jürgen Jänes, Milot Mirdita, Cameron L. M. Gilchrist, Tanita Wein, Mihaly Varadi, Sameer Velankar, Pedro Beltrao & Martin Steinegger.
Nature, September 2023.
[10.1038/s41586-023-06510-w]

TEMPROT: protein function annotation using transformers embeddings and homology search.
Gabriel B. Oliveira, Helio Pedrini & Zanoni Dias.
BMC Bioinformatics, June 2023.
[10.1186/s12859-023-05375-0]

Enzyme function prediction using contrastive learning.
Tianhao Yu, Haiyang Cui, Jianan Canal Li, Yunan Luo, Guangde Jiang, Huimin Zhao. Science, March 2023.
[10.1126/science.adf2465]

ProteInfer, deep neural networks for protein functional inference.
Theo Sanderson, Maxwell L Bileschi, David Belanger, Lucy J Colwell.
eLife, February 2023.
[10.7554/eLife.80942]

Machine learning models for the prediction of enzyme properties should be tested on proteins not used for model training.
Alexander Kroll, Martin J. Lercher.
Preprint, February 2023.
[10.1101/2023.02.06.526991]

Language models can identify enzymatic active sites in protein sequences.
Yves Gaetan Nana Teukam ,Loïc Kwate Dassi ,Matteo Manica, Daniel Probst ,Philippe Schwaller ,Teodoro Laino.
Preprint, February 2023.
[10.26434/chemrxiv-2021-m20gg-v3]

Alignment-free estimation of sequence conservation for identifying functional sites using protein sequence embeddings.
Wayland Yeung, Zhongliang Zhou, Sheng Li, Natarajan Kannan.
Briefings in Bioinformatics, January 2023.
[10.1093/bib/bbac599]

Fast and accurate protein function prediction from sequence through pretrained language model and homology-based label diffusion.
Qianmu Yuan, Junjie Xie, Jiancong Xie, Huiying Zhao, Yuedong Yang Preprint, December 2022.
[10.1101/2022.12.05.519119]

Vector-clustering Multiple Sequence Alignment: Aligning into the twilight zone of protein sequence similarity with protein language models.
Claire D. McWhite, Mona Singh.
Preprint, October 2022.
[10.1101/2022.10.21.513099]

Highly significant improvement of protein sequence alignments with AlphaFold2.
Athanasios Baltzis, Leila Mansouri, Suzanne Jin, Björn E Langer, Ionas Erb, Cedric Notredame.
Bioinformatics, September 2022.
[doi.org/10.1093/bioinformatics/btac625]

GO Bench: Shared-hub for Universal Benchmarking of Machine Learning-Based Protein Functional Annotations.
Andrew Dickson, Ehsaneddin Asgari, Alice C. McHardy, Mohammad R.K. Mofrad.
Preprint, July 2022.
[10.1101/2022.07.19.500685]

SETH predicts nuances of residue disorder from protein embeddings.
Dagmar Ilzhoefer, Michael Heinzinger, Burkhard Rost.
Preprint, June 2022.
[10.1101/2022.06.23.497276]

TSignal: A transformer model for signal peptide prediction.
Alexandru Dumitrescu, Emmi Jokinen, Juho Kellosalo, Ville Paavilainen, Harri Lähdesmäki.
Preprint, June 2022.
[10.1101/2022.06.02.493958]

TMbed – Transmembrane proteins predicted through Language Model embeddings.
Michael Bernhofer, Burkhard Rost.
Preprint, June 2022.
[10.1101/2022.06.12.495804]

Contrastive learning on protein embeddings enlightens midnight zone at lightning speed.
Michael Heinzinger, Maria Littmann, Ian Sillitoe, Nicola Bordin, Christine Orengo, Burkhard Rost.
Preprint, November 2021.
[10.1101/2021.11.14.468528]

SignalP 6.0 achieves signal peptide prediction across all types using protein language models.
Felix Teufel, José Juan Almagro Armenteros, Alexander Rosenberg Johansen, Magnús Halldór Gíslason, Silas Irby Pihl,Konstantinos D. Tsirigos,Ole Winther, Søren Brunak,Gunnar von Heijne, Henrik Nielsen. Preprint, July 2021.
[10.1101/2021.06.09.447770]

Convolutional neural networks with image representation of amino acid sequences for protein function prediction.
Samia Tasnim Sara, Md Mehedi Hasan, Ahsan Ahmada, Swakkhar Shatabda.
Computational Biology and Chemistry, June 2021.
[10.1016/j.compbiolchem.2021.107494]

Intrinsic-Extrinsic Convolution and Pooling for Learning on 3D Protein Structures. Pedro Hermosilla, Marco Schäfer, Matěj Lang, Gloria Fackelmann, Pere Pau Vázquez, Barbora Kozlíková, Michael Krone, Tobias Ritschel, Timo Ropinski.
Preprint, April 2021.
[arxiv]

Embeddings from deep learning transfer GO annotations beyond homology.
Maria Littmann, Michael Heinzinger, Christian Dallago, Tobias Olenyi, Burkhard Rost.
Preprint, September 2020.
[10.1101/2020.09.04.282814]

Structure-Based Protein Function Prediction using Graph Convolutional Networks.
Vladimir Gligorijevic, P. Douglas Renfrew, Tomasz Kosciolek, Julia Koehler Leman, Daniel Berenberg, Tommi Vatanen, Chris Chandler, Bryn C. Taylor, Ian M. Fisk, Hera Vlamakis, Ramnik J. Xavier, Rob Knight, Kyunghyun Cho, Richard Bonneau.
Preprint, June 2020.
[10.1101/786236]

Unsupervised protein embeddings outperform hand-crafted sequence and structure features at predicting molecular function.
Amelia Villegas-Morcillo, Stavros Makrodimitris, Roeland van Ham, Angel M. Gomez, Victoria Sanchez, Marcel Reinders.
Preprint, April 2020.
[10.1101/2020.04.07.028373]

Machine Learning Predicts New Anti-CRISPR Proteins.
Gavin J. Knott, Jennifer A. Doudna, Fayyaz ul Amir Afsar Minhas.
Preprint, November 2019.
[10.1101/854950]

Improving protein function prediction with synthetic feature samples created by generative adversarial networks.
Cen Wan, David T. Jones.
Preprint, August 2019.
[10.1101/730143]

Universal Deep Sequence Models for Protein Classification.
Nils Strodthoff, Patrick Wagner, Markus Wenzel, Wojciech Samek.
Preprint, July 2019.
[10.1101/704874]

Critiquing Protein Family Classification Models Using Sufficient Input Subsets.
Brandon Carter, Maxwell L. Bileschi, Jamie Smith, Theo Sanderson, Drew Bryant, David Belanger, Lucy J. Colwell.
Preprint, June 2019.
[10.1101/674119] [bioRxiv]

A Brief History of Protein Sorting Prediction.
Henrik Nielsen, Konstantinos D. Tsirigos, Søren Brunak, Gunnar von Heijne.
The Protein Journal, May 2019.
[10.1007/s10930-019-09838-3]

DEEPred: Automated Protein Function Prediction with Multi-task Feed-forward Deep Neural Networks.
Ahmet Sureyya Rifaioglu, Tunca Dogan, Maria Jesus Martin, Rengul Cetin-Atalay, Volkan Atalay.
Scientific Reports, May 2019.
[10.1038/s41598-019-43708-3]

Using Deep Learning to Annotate the Protein Universe.
Maxwell L. Bileschi, David Belanger, Drew Bryant, Theo Sanderson, Brandon Carter, D. Sculley, Mark A. DePristo, Lucy J. Colwell�.
Preprint, May 2019.
[10.1101/626507] [bioRxiv]

ECPred: a tool for the prediction of the enzymatic functions of protein sequences based on the EC nomenclature.
Alperen Dalkiran, Ahmet Sureyya Rifaioglu, Maria Jesus Martin, Rengul Cetin-Atalay, Volkan Atalay, Tunca Dogan.
BMC Bioinformatics, September 2018.
[10.1186/s12859-018-2368-y]

DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier.
Maxat Kulmanov, Mohammed Asif Khan, Robert Hoehndorf.
Bioinformatics, February 2018.
[10.1093/bioinformatics/btx624]

Near perfect protein multi-label classification with deep neural networks.
Balázs Szalkaia, Vince Grolmuszab.
Methods, January 2018.
[10.1016/j.ymeth.2017.06.034]

Large‐scale automated function prediction of protein sequences and an experimental case study validation on PTEN transcript variants.
Ahmet Sureyya Rifaioglu, Tunca Dogan, Omer Sinan Sarac, Tulin Ersahin, Rabie Saidi, Mehmet Volkan Atalay, Maria Jesus Martin, Rengul Cetin‐Atalay.
Proteins, November 2017.
[10.1002/prot.25416]

ProLanGO: Protein Function Prediction Using Neural Machine Translation Based on a Recurrent Neural Network.
Renzhi Cao, Colton Freitas, Leong Chan, Miao Sun, Haiqing Jiang, Zhangxin Chen.
Molecules, October 2017.
[10.3390/molecules22101732]

Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics.
Ehsaneddin Asgari, Mohammad R. K. Mofrad
PLOS One, November 2015.
[10.1371/journal.pone.0141287]

A structural alignment kernel for protein structures.
Jian Qiu, Martial Hue, Asa Ben-Hur, Jean-Philippe Vert, William Stafford Noble
Bioinformatics, January 2007.
[10.1093/bioinformatics/btl642]

The spectrum kernel: A string kernel for SVM protein classification.
Christina S Leslie, Eleazar Eskin, William Stafford Noble.
Pacific Symposium on Biocomputing, January 2002.
[pdf]

Predicting interactions with other molecules

Pairing interacting protein sequences using masked language modeling.
Umberto Lupo, Damiano Sgarbossa, Anne-Florence Bitbol.
Preprint, August 2023.
[10.1101/2023.08.14.553209]

From Proteins to Ligands: Decoding Deep Learning Methods for Binding Affinity Prediction.
Rohan Gorantla, Ažbeta Kubincová, Andrea Y. Weiße, Antonia S. J. S. Mey.
Preprint, August 2023.
[10.1101/2023.08.01.551483]

De Novo Design of κ-Opioid Receptor Antagonists Using a Generative Deep-Learning Framework.
Leslie Salas-Estrada, Davide Provasi, Xing Qiu, Husnu Ümit Kaniskan, Xi-Ping Huang, Jeffrey F. DiBerto, João Marcelo Lamim Ribeiro, Jian Jin, Bryan L. Roth, and Marta Filizola.
Journal of Chemical Informatics and Modeling, August 2023.
[10.1021/acs.jcim.3c00651]

Sequence-Based Nanobody-Antigen Binding Prediction.
Usama Sardar, Sarwan Ali, Muhammad Sohaib Ayub, Muhammad Shoaib, Khurram Bashir, Imdad Ullah Khan, Murray Patterson.
Preprint, July 2023.
[arxiv]

Machine learning optimization of candidate antibody yields highly diverse sub-nanomolar affinity antibody libraries.
Lin Li, Esther Gupta, John Spaeth, Leslie Shing, Rafael Jaimes, Emily Engelhart, Randolph Lopez, Rajmonda S. Caceres, Tristan Bepler & Matthew E. Walsh.
Nature Communications, June 2023.
[10.1038/s41467-023-39022-2]

Matching receptor to odorant with protein language and graph neural networks.
Matej Hladiš, Maxence Lalis, Sebastien Fiorucci, Jérémie Topin.
ICLR, May 2023.
ICLR

A general model to predict small molecule substrates of enzymes based on machine and deep learning.
Alexander Kroll, Sahasra Ranjan, Martin K. M. Engqvist & Martin J. Lercher.
Nature Communications, May 2023.
[10.1038/s41467-023-38347-2]

De novo design of protein interactions with learned surface fingerprints.
Pablo Gainza, Sarah Wehrle, Alexandra Van Hall-Beauvais, Anthony Marchand, Andreas Scheck, Zander Harteveld, Stephen Buckley, Dongchun Ni, Shuguang Tan, Freyr Sverrisson, Casper Goverde, Priscilla Turelli, Charlène Raclot, Alexandra Teslenko, Martin Pacesa, Stéphane Rosset, Sandrine Georgeon, Jane Marsden, Aaron Petruzzella, Kefang Liu, Zepeng Xu, Yan Chai, Pu Han, George F. Gao, …Bruno E. Correia.
Nature, April 2023.
[10.1038/s41586-023-05993-x]

PeSTo: parameter-free geometric deep learning for accurate prediction of protein binding interfaces.
Lucien F. Krapp, Luciano A. Abriata, Fabio Cortés Rodriguez & Matteo Dal Peraro.
Nature Communications, April 2023.
[10.1038/s41467-023-37701-8]

DRPBind: prediction of DNA, RNA and protein binding residues in intrinsically disordered protein sequences.
Ronesh Sharma, Tatsuhiko Tsunoda, Alok Sharma.
Preprint, March 2023.
[10.1101/2023.03.20.533427]

FlexVDW: A machine learning approach to account for protein flexibility in ligand docking.
Patricia Suriana, Joseph M. Paggi, Ron O. Dror.
Preprint, March 2023.
[arxiv]

Application of artificial intelligence to decode the relationships between smell, olfactory receptors and small molecules.
Rayane Achebouche, Anne Tromelin, Karine Audouze & Olivier Taboureau.
Scientific Reports, November 2022.
[10.1038/s41598-022-23176-y]

DiffBP: Generative Diffusion of 3D Molecules for Target Protein Binding.
Haitao Lin, Yufei Huang, Meng Liu, Xuanjing Li, Shuiwang Ji, Stan Z. Li.
Preprint, November 2022.
[arxiv]

DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking.
Gabriele Corso, Hannes Stärk, Bowen Jing, Regina Barzilay, Tommi Jaakkola.
Preprint, October 2022.
[arxiv]

Benchmarking AlphaFold-enabled molecular docking predictions for antibiotic discovery.
Felix Wong, Aarti Krishnan, Erica J Zheng, Hannes Stärk, Abigail L Manson, Ashlee M Earl, Tommi Jaakkola, James J Collins.
Molecular Systems Biology, September 2022.
[10.15252/msb.202211081]

Dynamic-Backbone Protein-Ligand Structure Prediction with Multiscale Generative Diffusion Models.
Zhuoran Qiao, Weili Nie, Arash Vahdat, Thomas F. Miller III, Anima Anandkumar.
Preprint, September 2022.
[arxiv]

Widely Used and Fast De Novo Drug Design by a Protein Sequence-Based Reinforcement Learning Model.
Yaqin Li, Lingli Li, Yongjin Xu, Yi Yu.
Preprint, August 2022.
[10.1101/2022.08.18.504370]

Antibody-Antigen Docking and Design via Hierarchical Equivariant Refinement.
Wengong Jin, Regina Barzilay, Tommi Jaakkola.
Preprint, July 2022.
[arxiv]

Cross-Modality and Self-Supervised Protein Embedding for Compound–Protein Affinity and Contact Prediction.
Yuning You, Yang Shen.
Preprint, July 2022.
[10.1101/2022.07.18.500559]

EvoBind: in silico directed evolution of peptide binders with AlphaFold.
Patrick Bryant, Arne Elofsson.
Preprint, July 2022.
[10.1101/2022.07.23.501214]

BepiPred-3.0: Improved B-cell epitope prediction using protein language models.
Joakim Clifford, Magnus Haraldson Høie, Morten Nielsen, Sebastian Deleuran, Bjoern Peters, Paolo Marcatili.
Preprint, July 2022.
[10.1101/2022.07.11.499418]

Predicting the specific substrate for transmembrane transport proteins using BERT language model.
Sima Ataei, Gregory Butler.
Preprint, July 2022.
[10.1101/2022.07.23.501263]

Peptide binding specificity prediction using fine-tuned protein structure prediction networks.
Amir Motmaen, Justas Dauparas, Minkyung Baek, Mohamad H. Abedi, David Baker, Philip Bradley Preprint, July 2022.
[10.1101/2022.07.12.499365]

Predicting the locations of cryptic pockets from single protein structures using the PocketMiner graph neural network.
Artur Meller, Michael Ward, Jonathan Borowsky, Jeffrey M. Lotthammer, Meghana Kshirsagar, Felipe Oviedo, Juan Lavista Ferres, Gregory R. Bowman Preprint, June 2022.
[10.1101/2022.06.28.497399]

Topsy-Turvy: integrating a global view into sequence-based PPI prediction.
Rohit Singh, Kapil Devkota, Samuel Sledzieski, Bonnie Berger, Lenore Cowen. Bioinformatics, July 2022.
[10.1093/bioinformatics/btac258]

Scaffolding protein functional sites using deep learning.
Jue Wang, Sidney Lisanza, David Juergens, Doug Tischer, Joseph L. Watson, Karla M. Castro, Robert Ragotte, Amijai Saragovi, Lukas F. Milles, Minkyung Baek, Ivan Anishchenko, Wei Yang, Derrick R. Hicks, Marc Expòsit, Thomas Schlichthaerle, Jung-Ho Chun, Justas Dauparas, Nathaniel Bennett, Basile I. M. Wicky, Andrew Muenks, Frank DiMaio, Bruno Correia, Sergey Ovchinnikov and David Baker.
Science, July 2022.
[10.1126/science.abn2100]

The substrate scopes of enzymes: a general prediction model based on machine and deep learning. Alexander Kroll, Sahasra Ranjan, Martin K. M. Engqvist, Martin J. Lercher.
Preprint, May 2022.
[10.1101/2022.05.24.493213]

Machine learning modeling of family wide enzyme-substrate specificity screens.
Samuel Goldman, Ria Das, Kevin K. Yang, Connor W. Coley.
PLoS Computational Biology, February 2022.
[10.1371/journal.pcbi.1009853] The substrate scopes of enzymes: a general prediction model based on machine and deep learning Alexander Kroll, Sahasra Ranjan, Martin K. M. Engqvist, Martin J. Lercher.
Preprint, May 2022.
[10.1101/2022.05.24.493213]

AlphaFold encodes the principles to identify high affinity peptide binders.. Liwei Chang, Alberto Perez.
Preprint, March 2022. [10.1101/2022.03.18.484931]

Leveraging nonstructural data to predict structures and affinities of protein–ligand complexes.
Joseph M. Paggi, Julia A. Belk, Scott A. Hollingsworth, Nicolas Villanueva, Alexander S. Powers, Mary J. Clark, Augustine G. Chemparathy, Jonathan E. Tynan, Thomas K. Lau, Roger K. Sunahara, and Ron O. Dror.
PNAS, December 2021.
[10.1073/pnas.2112621118]

AlphaFill: enriching the AlphaFold models with ligands and co-factors.
Maarten L Hekkelman, Ida de de Vries, Robbie P Joosten, Anastassis Perrakis.
Preprint, November 2021.
[10.1101/2021.11.26.470110]

Deep learning allows genome-scale prediction of Michaelis constants from structural features.
Alexander Kroll, Martin K. M. Engqvist, David Heckmann, Martin J. Lercher.
PLoS Biology, October 2021
[10.1371/journal.pbio.3001402]

Probing T-cell response by sequence-based probabilistic modeling.. Barbara Bravi, Vinod P. Balachandran, Benjamin D. Greenbaum, Aleksandra M. Walczak, Thierry Mora, Rémi Monasson, Simona Cocco.
PLOS Computational Biology, September 2021.
[10.1371/journal.pcbi.1009297]

Biologically relevant transfer learning improves transcription factor binding prediction.
Gherman Novakovsky, Manu Saraswat, Oriol Fornes, Sara Mostafavi & Wyeth W. Wasserman.
Genome Biology, September 2021.
[10.1186/s13059-021-02499-5]

Deep learning based kcat prediction enables improved enzyme constrained model reconstruction.
Feiran Li, Le Yuan, Hongzhong Lu, Gang Li, Yu Chen, Martin K. M. Engqvist, Eduard J Kerkhoven, Jens Nielsen.
Preprint, August 2021
[10.1101/2021.08.06.455417 ]

A billion synthetic 3D-antibody-antigen complexes enable unconstrained machine-learning formalized investigation of antibody specificity prediction.. Philippe A. Robert,Rahmad Akbar,Robert Frank,Milena Pavlović,Michael Widrich,Igor Snapkov,Maria Chernigovskaya,Lonneke Scheffer,Andrei Slabodkin,Brij Bhushan Mehta, Mai Ha Vu, Aurél Prósz, Krzysztof Abram, Alex Olar,Enkelejda Miho, Dag Trygve Tryslew Haug,Fridtjof Lund-Johansen,Sepp Hochreiter, Ingrid Hobæk Haff,Günter Klambauer,Geir K. Sandve,Victor Greiff.
Preprint, July 2021.
[10.1101/2021.07.06.451258]

Leveraging Sequential and Spatial Neighbors Information by Using CNNs Linked With GCNs for Paratope Prediction.
Shuai Lu, Yuguang Li, Fei Wang, Xiaofei Nan, Shoutao Zhang.

Neural message passing for joint paratope-epitope prediction.
Alice Del Vecchio, Andreea Deac, Pietro Liò, Petar Veličković.
Preprint, May 2021.
[arxiv]

Interpreting Neural Networks for Biological Sequences by Learning Stochastic Masks.
Johannes Linder, Alyssa La Fleur, Zibo Chen, Ajasja Ljubetič, David Baker, Sreeram Kannan, Georg Seelig.
Preprint, April 2021.
[10.1101/2021.04.29.441979]

GraphProt2: A graph neural network-based method for predicting binding sites of RNA-binding proteins.
Michael Uhl, Van Dinh Tran, Florian Heyl, Rolf Backofen.
Preprint, March 2021.
[10.1101/850024]

Using the antibody-antigen binding interface to train image-based deep neural networks for antibody-epitope classification.
Daniel R. Ripoll, Sidhartha Chaudhury, Anders Wallqvist.
PLOS Computational Biology, March 2021.
[10.1371/journal.pcbi.1008864]

A multitask transfer learning framework for novel virus-human protein interactions.
Ngan Thi Dong, Megha Khosla.
Preprint, March 2021.
[10.1101/2021.03.25.437037]

EGRET: Edge Aggregated Graph Attention Networks and Transfer Learning Improve Protein-Protein Interaction Site Prediction.
Sazan Mahbub, Md Shamsuzzoha Bayzid.
Preprint, February 2021.
[10.1101/2020.11.07.372466]

Towards a systematic characterization of protein complex function: a natural language processing and machine-learning framework.
Varun S. Sharma, Andrea Fossati, Rodolfo Ciuffa, Marija Buljan, Evan G. Williams, Zhen Chen, Wenguang Shao, Patrick G. A. Pedrioli, Anthony W. Purcell, María Rodríguez Martínez, … Chen Li. Preprint, February 2021.
[10.1101/2021.02.24.432789]

Capsule network for protein ubiquitination site prediction.
Qiyi Huang, Jiulei Jiang, Yin Luo, Weimin Li, Ying Wang.
Preprint, January 2021.
[10.1101/2021.01.07.425697]

Accurate neoantigen prediction depends on mutation position relative to patient allele-specific MHC anchor location.
Huiming Xia, Joshua F. McMichael, Suangson Supabphol, Megan M. Richters, Anamika Basu, Cody A. Ramirez, Cristina Puig-Saus, Kelsy C. Cotto, Jasreet Hundal, Susanna Kiwala, … Malachi Griffith.
Preprint, December 2020.
[10.1101/2020.12.08.416271]

DeepPurpose: a deep learning library for drug–target interaction prediction.
Kexin Huang, Tianfan Fu, Lucas M. Glass, Marinka Zitnik, Cao Xiao, Jimeng Sun.
Bioinformatics, December 2020.
[10.1093/bioinformatics/btaa1005]

Substrate specificity of 2-deoxy-D-ribose 5-phosphate aldolase (DERA) assessed by different protein engineering and machine learning methods.
Sanni Voutilainen, Markus Heinonen, Martina Andberg, Emmi Jokinen, Hannu Maaheimo, Johan Pääkkönen, Nina Hakulinen, Juha Rouvinen, Harri Lähdesmäki, Samuel Kaski, Juho Rousu, Merja Penttilä & Anu Koivula.
Applied Microbiology and Biotechnology, November 2020.
[10.1007/s00253-020-10960-x]

BERTMHC: Improves MHC-peptide class II interaction prediction with transformer and multiple instance learning.
Jun Cheng, Kaïdre Bendjama, Karola Rittner, Brandon Malone.
Preprint, November 2020.
[10.1101/2020.11.24.396101]

Predicting Cell-Penetrating Peptides: Building and Interpreting Random Forest based prediction Models.
Shilpa Yadahalli, Chandra S. Verma.
Preprint, October 2020.
[10.1101/2020.10.15.341149]

Struct2Graph: A graph attention network for structure based predictions of protein-protein interactions.
Mayank Baranwal, Abram Magner, Jacob Saldinger, Emine S. Turali-Emre, Shivani Kozarekar, Paolo Elvati, J. Scott VanEpps, Nicholas A. Kotov, Angela Violi, Alfred O. Hero.
Preprint, September 2020.
[10.1101/2020.09.17.301200]

Predicting antigen specificity of single T cells based on TCR CDR3 regions.
David S Fischer, Yihan Wu, Benjamin Schubert, Fabian J Theis.
Molecular Systems Biology, August 2020.
[10.15252/msb.20199416]

DeepKinZero: zero-shot learning for predicting kinase–phosphosite associations involving understudied kinases.
Iman Deznabi, Busra Arabaci, Mehmet Koyutürk, Oznur Tastan.
Bioinformatics, June 2020.
[10.1093/bioinformatics/btaa013]

EpiDope: A Deep neural network for linear B-cell epitope prediction.
Maximilian Collatz, Florian Mock, Martin Hölzer, Emanuel Barth, Konrad Sachse, Manja Marz.
Preprint, May 2020.
[10.1101/2020.05.12.090019]

Site2Vec: a reference frame invariant algorithm for vector embedding of protein-ligand binding sites.
Arnab Bhadra, Kalidas Y.
Preprint, March 2020.
[arxiv]

Energy-based graph convolutional networks for scoring protein docking models.
Yue Cao, Yang Shen.
Proteins: Structure, Function, and Bioinformatics, 2020.
[10.1002/prot.25888]

Mutation effect estimation on protein-protein interactions using deep contextualized representation learning
Guangyu Zhou, Muhao Chen, Chelsea J.-T. Ju, Zheng Wang, Jyun-Yu Jiang, Wei Wang.
NAR Genomics and Bioinformatics, March 2020
[10.1093/nargab/lqaa015]

Biophysical prediction of protein–peptide interactions and signaling networks using machine learning.
Joseph M. Cunningham, Grigoriy Koytiger, Peter K. Sorger & Mohammed AlQuraishi.
Nature Methods, January 2020.
[10.1038/s41592-019-0687-1]

Functions of olfactory receptors are decoded from their sequence.
Xiaojing Cong, Wenwen Ren, Jody Pacalon, Claire A. de March, Lun Xu, Hiroaki Matsunami, Yiqun Yu, Jérôme Golebiowski.
Preprint, January 2020.
[10.1101/2020.01.06.895540]

Sequence-to-function deep learning frameworks for synthetic biology.
Jacqueline Valeri, Katherine M. Collins, Bianca A. Lepe, Timothy K. Lu, Diogo M. Camacho.
Preprint, December 2019.
[10.1101/870055]

Explainable Deep Relational Networks for Predicting Compound-Protein Affinities and Contacts.
Mostafa Karimi, Di Wu, Zhangyang Wang, Yang Shen.
Preprint, December 2019.
[arxiv]

Using Single Protein/Ligand Binding Models to Predict Active Ligands for Previously Unseen Proteins.
Vikram Sundar, Lucy Colwell.
NeurIPS Workshop on Machine Learning and the Physical Sciences, December 2019.
[ML4PS]

Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning.
P. Gainza, F. Sverrisson, F. Monti, E. Rodolà, D. Boscaini, M. M. Bronstein, B. E. Correia.
Nature Methods, December 2019.
[10.1038/s41592-019-0666-6]

End-to-End Learning on 3D Protein Structure for Interface Prediction
Raphael J. L. Townshend, Rishi Bedi, Patricia A. Suriana, Ron O. Dror.
NeurIPS, December 2019.
[arxiv]

USMPep: Universal Sequence Models for Major Histocompatibility Complex Binding Affinity Prediction.
Johanna Vielhaben, Markus Wenzel, Wojciech Samek, Nils Strodthoff.
Preprint, October 2019. [10.1101/816546]

DeepCLIP: Predicting the effect of mutations on protein-RNA binding with Deep Learning.
Alexander Gulliver Bjørnholt Grønning, Thomas Koed Doktor, Simon Jonas Larsen, Ulrika Simone Spangsberg Petersen, Lise Lolle Holm, Gitte Hoffmann Bruun, Michael Birkerod Hansen, Anne-Mette Hartung, Jan Baumbach, Brage Storstein Andresen.
Preprint, September 2019.
[10.1101/757062]

Multifaceted protein–protein interaction prediction based on Siamese residual RCNN
Muhao Chen, Chelsea J.-T. Ju, Guangyu Zhou, Xuelu Chen, Tianran Zhang, Kai-Wei Chang, Carlo Zaniolo, Wei Wang.
Bioinformatics, July 2019. (Procs. ISMB/ECCB-2019)
[10.1093/bioinformatics/btz328]

DeepConv-DTI: Prediction of drug-target interactions via deep learning with convolution on protein sequences.
Ingoo Lee, Jongsoo Keum, Hojung Nam.
PLOS Computational Biology, June 2019.
[10.1371/journal.pcbi.1007129]

Leveraging binding-site structure for drug discovery with point-cloud methods.
Vincent Mallet, Carlos G. Oliver, Nicolas Moitessier, Jerome Waldispuhl.
Preprint, May 2019.
[arXiV]]

Repertoires of G protein-coupled receptors for Ciona-specific neuropeptides.
Akira Shiraishi, Toshimi Okuda, Natsuko Miyasaka, Tomohiro Osugi, Yasushi Okuno, Jun Inoue, and Honoo Satake.
PNAS, March 2019.
[10.1073/pnas.1816640116]

Simple tricks of convolutional neural network architectures improve DNA–protein binding prediction.
Zhen Cao, Shihua Zhang.
Bioinformatics, October 2018.
[10.1093/bioinformatics/bty893

MHCflurry: Open-Source Class I MHC Binding Affinity Prediction.
Timothy J. O'Donnell, Alex Rubinsteyn, Maria Bonsack, Angelika B. Riemer, Uri Laserson, Jeff Hammerbacher.
Cell Systems, June 2018.
[10.1016/j.cels.2018.05.014]

P2Rank: machine learning based tool for rapid and accurate prediction of ligand binding sites from protein structure.
Radoslav Krivak, David Hoksza.
Journal of Cheminformatics, August 2018.
[10.1186/s13321-018-0285-8]

DeepMHC: Deep Convolutional Neural Networks for High-performance peptide-MHC Binding Affinity Prediction.
Jianjun Hu, Zhonghao Liu.
Preprint, December 2017.
[10.1101/239236] [bioRxiv]

DeepSite: protein-binding site predictor using 3D-convolutional neural networks.
J, Jiménez. S. Doerr, G. Martínez-Rosell, A. S. Rose, G. De Fabritiis.
Bioinformatics, October 2017.
[10.1093/bioinformatics/btx350]

Predicting Protein Binding Affinity With Word Embeddings and Recurrent Neural Networks.
Carlo Mazzaferro.
Preprint, April 2017.
[10.1101/128223] [bioRxiv]

Atomic Convolutional Networks for Predicting Protein-Ligand Binding Affinity.
Joseph Gomes, Bharath Ramsundar, Evan N. Feinberg, Vijay S. Pande.
Preprint, March 2017.
[arxiv]

Convolutional neural network architectures for predicting DNA–protein binding.
Haoyang Zeng, Matthew D. Edwards. Ge Liu, David K. Gifford.
Bioinformatics, 15 June 2016.
[10.1093/bioinformatics/btw255]

A deep learning framework for modeling structural features of RNA-binding protein targets.
Sai Zhang, Jingtian Zhou, Hailin Hu, Haipeng Gong, Ligong Chen, Chao Cheng, Jianyang Zeng.
Nucleic Acids Research, October 2015.
[10.1093/nar/gkv1025]

Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning.
Babak Alipanahi, Andrew Delong, Matthew T. Weirauch, Brendan J. Frey.
Nature Biotechnology, July 2015.
[10.1038/nbt.3300]

Protein-protein docking using learned three-dimensional representations.
Georgy Derevyanko, Guillaume Lamoureux.
Preprint, March 2017.
[10.1101/738690][bioRxiv]

Other supervised learning

The simplicity of protein sequence-function relationships.
Yeonwoo Park, Brian P.H. Metzger, Joseph W. Thornton.
Preprint, September 2023.
[10.1101/2023.09.02.556057]

Deep learning-based codon optimization with large-scale synonymous variant datasets enables generalized tunable protein expression.
David A. Constant, Jahir M. Gutierrez, Anand V. Sastry, Rebecca Viazzo, Nicholas R. Smith, Jubair Hossain, David A. Spencer, Hayley Carter, Abigail B. Ventura, Michael T. M. Louie, Christa Kohnert, Rebecca Consbruck, Joshua Bennett, Kenneth A. Crawford, John M. Sutton, Anneliese Morrison, Andrea K. Steiger, Kerianne A. Jackson, Jennifer T. Stanton, Shaheed Abdulhaqq, Gregory Hannum, Joshua Meier, Matthew Weinstock, Miles Gander.
Preprint, Feb 2023.
[10.1101/2023.02.11.528149]

Coherent Blending of Biophysics-Based Knowledge with Bayesian Neural Networks for Robust Protein Property Prediction.
Hunter Nisonoff, Yixin Wang, and Jennifer Listgarten.
ACS Synthetic Biology, October 2023.
[10.1021/acssynbio.3c00217]

FiTMuSiC: Leveraging structural and (co)evolutionary data for protein fitness prediction.
Matsvei Tsishyn, Gabriel Cia, Pauline Hermans, Jean Kwasigroch, Marianne Rooman, Fabrizio Pucci.
Preprint, August 2023.
[10.1101/2023.08.01.551497]

Assessing the performance of protein regression models.
Richard Michael, Jacob Kæstel-Hansen, Peter Mørch Groth, Simon Bartels, Jesper Salomon, Pengfei Tian, Nikos S. Hatzakis, Wouter K. Boomsma.
Preprint, June 2023.
[10.1101/2023.06.18.545472]

Interpretable neural architecture search and transfer learning for understanding sequence dependent enzymatic reactions.
Zijun Zhang, Adam R. Lamson, Michael Shelley, Olga Troyanskaya.
Preprint, May 2023.
[arxiv]

Protein language model-based end-to-end type II polyketide prediction without sequence alignment.
Jiaquan Huang, Qiandi Gao, Ying Tang, Yaxin Wu, Heqian Zhang, Zhiwei Qin.
Preprint, April 2023.
[10.1101/2023.04.18.537339]

Flattening the curve - How to get better results with small deep-mutational-scanning datasets.
Gregor Wirnsberger, Iva Pritišanac, Gustav Oberdorfer, Karl Gruber.
Preprint, March 2023.
[10.1101/2023.03.27.534314]

Prediction and Design of Protease Enzyme Specificity Using a Structure-Aware Graph Convolutional Network.
Changpeng Lu, Joseph H. Lubin, Vidur V. Sarma, Samuel Z. Stentz, Guanyang Wang, Sijian Wang, Sagar D. Khare.
Preprint, February 2023.
[10.1101/2023.02.16.528728]

Linear-scaling kernels for protein sequences and small molecules outperform deep learning while providing uncertainty quantitation and improved interpretability.
Jonathan Parkinson, Wei Wang.
Preprint, February 2023.
[arxiv]

Turnover number predictions for kinetically uncharacterized enzymes using machine and deep learning.
Alexander Kroll, Xiao-Pan Hu, Nina A. Liebrand, Martin J. Lercher.
Preprint, November 2022.
[10.1101/2022.11.10.516024]

PrMFTP: Multi-functional therapeutic peptides prediction based on multi-head self-attention mechanism and class weight optimization.
Wenhui Yan ,Wending Tang ,Lihua Wang,Yannan Bin ,Junfeng Xia.
PLOS Comp Bio, September 2022.
[10.1371/journal.pcbi.1010511]

Learning with phenotypic similarity improves the prediction of functional effects of missense variants in voltage-gated sodium channels.
Christian Malte Boßelmann, Ulrike B.S. Hedrich, Holger Lerche, Nico Pfeifer.
Preprint, September 2022.
[10.1101/2022.09.29.510111]

A synthetic protein-level neural network in mammalian cells.
Zibo Chen, James M. Linton, Ronghui Zhu, Michael B. Elowitz.
Preprint, July 2022.
[10.1101/2022.07.10.499405]

Protein structure prediction in the era of AI: challenges and limitations when applying to in-silico force spectroscopy.
Priscila S. F. C. Gomes, Diego E. B. Gomes, Rafael C. Bernardi.
Preprint, July 2022.
doi: https://doi.org/10.1101/2022.06.30.498329

A synthetic protein-level neural network in mammalian cells.
Zibo Chen, James M Linton, Ronghui Zhu, Michael Elowitz.
Preprint, July 2022.
[10.1101/2022.07.10.499405]

PRESTO: Rapid protein mechanical strength prediction with an end-to-end deep learning model.
Frank Y.C.Liu, Bo Ni, Markus J.Buehler.
Extreme mechanics letters, August 2022.
[10.1016/j.eml.2022.101803]

Deep learning-based kcat prediction enables improved enzyme-constrained model reconstruction.
Feiran Li, Le Yuan, Hongzhong Lu, Gang Li, Yu Chen, Martin K. M. Engqvist, Eduard J. Kerkhoven, Jens Nielsen.
Nature Catalysis, June 2022.
[10.1038/s41929-022-00798-z]

A Topological Data Analytic Approach for Discovering Biophysical Signatures in Protein Dynamics.
Wai Shing Tang, Gabriel Monteiro da Silva, Henry Kirveslahti, Erin Skeens, Bibo Feng, Timothy Sudijono, Kevin K Yang, Sayan Mukherjee, Brenda Rubenstein, Lorin Crawford.
PLoS Computational Biology, May 2022.
[10.1371/journal.pcbi.1010045]

Neural networks to learn protein sequence–function relationships from deep mutational scanning data.
Sam Gelman, Sarah A. Fahlberg, Pete Heinzelman, Philip A. Romero, and Anthony Gitter.
PNAS, November 2021.
[10.1073/pnas.2104878118]

Multiscale profiling of enzyme activity in cancer.
Ava P. Soleimany, Jesse D. Kirkpatrick, Cathy S. Wang, Alex M. Jaeger, Susan Su, Santiago Naranjo, Qian Zhong, Christina M. Cabana, Tyler Jacks, Sangeeta N. Bhatia.
Preprint, November 2021.
[10.1101/2021.11.11.468288]

Epistatic Net allows the sparse spectral regularization of deep neural networks for inferring fitness functions.
Amirali Aghazadeh, Hunter Nisonoff, Orhan Ocal, David H. Brookes, Yijie Huang, O. Ozan Koyluoglu, Jennifer Listgarten & Kannan Ramchandran.
Nature Communications, September 2021.
[10.1038/s41467-021-25371-3]

AllerStat: Finding Statistically Significant Allergen-Specific Patterns in Protein Sequences by Machine Learning.
Kento Goto, Norimasa Tamehiro, Takumi Yoshida, Hiroyuki Hanada, Takuto Sakuma, Reiko Adachi, Kazunari Kondo, Ichiro Takeuchi.
Preprint, August 2021.
[10.1101/2021.08.17.456743]

A Topological Data Analytic Approach for Discovering Biophysical Signatures in Protein Dynamics.
Wai Shing Tang, Gabriel Monteiro da Silva, Henry Kirveslahti, Erin Skeens, Bibo Feng, Timothy Sudijono, Kevin K Yang, Sayan Mukherjee, Brenda Rubenstein, Lorin Crawford.
Preprint, July 2021.
[10.1101/2021.07.28.454240]

Performance of Regression Models as a Function of Experiment Noise.
Gang Li, Jan Zrimec, Boyang Ji, Jun Geng, Johan Larsbrink, Aleksej Zelezniak, Jens Nielsen, Martin K. M. Engqvist.
Bioinformatics and Biology Insights, June 2021.
[10.1177/11779322211020315]

In-Pero: Exploiting deep learning embeddings of protein sequences to predict the localisation of peroxisomal proteins.
Marco Anteghini, Vitor AP Martins dos Santos, Edoardo Saccenti.
International Journal of Molecular Sciences, June 2021.
[10.3390/ijms22126409]

Predicting and interpreting large scale mutagenesis data using analyses of protein stability and conservation.
Magnus H. Høie, Matteo Cagiada, Anders Haagen Beck Frederiksen, Amelie Stein, Kresten Lindorff-Larsen.
Preprint, June 2021.
[10.1101/2021.06.26.450037]

Machine learning differentiates enzymatic and non-enzymatic metals in proteins.
Ryan Feehan, Meghan W. Franklin, Joanna S. G. Slusky.
Nature Communications, June 2021.
[10.1038/s41467-021-24070-3]

Assessing the performance of computational predictors for estimating protein stability changes upon missense mutations.
Shahid Iqbal, Fuyi Li, Tatsuya Akutsu, David B Ascher, Geoffrey I Webb, Jiangning Song.
Briefings in Bioinformatics, May 2021.
[10.1093/bib/bbab184]

Predicting enzymatic reactions with a molecular transformer.
David Kreutter, Philippe Schwaller, Jean-Louis Reymond.
Chemical Science, May 2021.
[10.1039/D1SC02362D]

On the sparsity of fitness functions and implications for learning.
David H. Brookes, Amirali Aghazadeh, Jennifer Listgarten.
Preprint, May 2021.
[10.1101/2021.05.24.445506]

Deep protein representations enable recombinant protein expression prediction.
Hannah-Marie Martiny, Jose Juan Almagro Armenteros, Alexander Rosenberg Johansen, Jesper Salomon, Henrik Nielsen.
Preprint, May 2021.
[10.1101/2021.05.13.443426]

Deep learning tools and modeling to estimate the temporal expression of cell cycle proteins from 2D still images. Thierry Pécot, Maria C. Cuitiño, Roger H. Johnson, Cynthia Timmers, Gustavo Leone.
Preprint, April 2021.
[10.1101/2021.03.01.433386]

Light Attention Predicts Protein Location from the Language of Life.
Hannes Stärk, Christian Dallago, Michael Heinzinger, Burkhard Rost.
Preprint, April 2021.
[10.1101/2021.04.25.441334]

Positional SHAP (PoSHAP) for Interpretation of Machine Learning Models Trained from Biological Sequences.
Quinn Dickinson, Jesse G. Meyer.
Preprint, March 2021.
[10.1101/2021.03.04.433939]

Modeling mutational effects on biochemical phenotypes using convolutional neural networks: application to SARS-CoV-2.
Bo Wang, Eric R. Gamazon.
Preprint, February 2021.
[10.1101/2021.01.28.428521]

Identifying protein subcellular localisation in scientific literature using bidirectional deep recurrent neural network.
Rakesh David, Rhys-Joshua D. Menezes, Jan De Klerk, Ian R. Castleden, Cornelia M. Hooper, Gustavo Carneiro & Matthew Gilliham.
Scientific Reports, January 2021.
[10.1038/s41598-020-80441-8]

DeepPSC (protein structure camera): computer vision-based reconstruction of proteins backbone structure from alpha carbon trace as a case study.
Xing Zhang, Junwen Luo, Yi Cai, Wei Zhu, Xiaofeng Yang, Hongmin Cai, Zhanglin Lin.
Preprint, August 2020.
[10.1101/2020.08.12.247312]

TransINT: an interface-based prediction of membrane protein-protein interactions.
G. Khazen, A. Gyulkhandanian, T. Issa, R.C. Maroun.
Preprint, July 2020.
[10.1101/871590]

DeepEMhancer: a deep learning solution for cryo-EM volume post-processing.
R Sanchez-Garcia, J Gomez-Blanco, A Cuervo, JM Carazo, COS Sorzano, J Vargas.
Preprint, June 2020.
[10.1101/2020.06.12.148296]

ProtTox: Toxin identification from Protein Sequences.
Sathappan Muthiah, Debanjan Datta, Mohammad Raihanul Islam, Patrick Butler, Andrew Warren, Naren Ramakrishnan.
Preprint, April 2020.
[10.1101/2020.04.18.048439]

Predicting the Viability of Beta-Lactamase: How Folding and Binding Free Energies Correlate with Beta-Lactamase Fitness.
Jordan Yang, Nandita Naik, Jagdish Suresh Patel, Christopher S. Wylie, Wenze Gu, Jessie Huang, Marty Ytreberg, Mandar T. Naik, Daniel M. Weinreich, Brenda M. Rubenstein.
Preprint, April 2020.
[10.1101/2020.04.15.043661]

Classifying protein structures into folds by convolutional neural networks, distance maps, and persistent homology.
Yechan Hong, Yongyu Deng, Haofan Cui, Jan Segert, Jianlin Cheng.
Preprint, April 2020.
[10.1101/2020.04.15.042739]

Minimum epistasis interpolation for sequence-function relationships.
Juannan Zhou, David M. McCandlish.
Nature Communications, April 2020.
[10.7554/eLife.16965.024]

Machine Learning to Identify Flexibility Signatures of Class A GPCR Inhibition.
Joseph Bemister-Buffington, Alex J. Wolf, Sebastian Raschka, Leslie A. Kuhn.
Biomolecules, March 2020.
[10.3390/biom10030454]

Extraction of Protein Dynamics Information Hidden in Cryo-EM Map Using Deep Learning.
Shigeyuki Matsumoto, Shoichi Ishida, Mitsugu Araki, Takayuki Kato, Kei Terayama, Yasushi Okuno.
Preprint, February 2020.
[10.1101/2020.02.17.951863]

Transformer neural network for protein specific de novo drug generation as machine translation problem.
Daria Grechishnikova.
Preprint, December 2019.
[10.1101/863415]

Iterative Peptide Modeling With Active Learning And Meta-Learning.
Rainier Barrett, Andrew D. White.
Preprint, November 2019.
[arxiv]

Deep convolutional neural network and attention mechanism based pan-specific model for interpretable MHC-I peptide binding prediction.
Jing Jin, Zhonghao Liu, Alireza Nasiri, Yuxin Cui, Stephen Louis, Ansi Zhang, Yong Zhao, Jianjun Hu.
Preprint, November 2019.
[10.1101/830737]

BCrystal: an interpretable sequence-based protein crystallization predictor.
Abdurrahman Elbasir, Raghvendra Mall, Khalid Kunji, Reda Rawi, Zeyaul Islam, Gwo-Yu Chuang, Prasanna R Kolatkar, Halima Bensmail. Bioinformatics, October 2019.
[10.1093/bioinformatics/btz762]

Deep learning regression model for antimicrobial peptide design.
Jacob Witten, Zack Witten.
Preprint, July 2019.
/10.1101/692681] [bioRxiv]

Using machine learning to predict organismal growth temperatures from protein primary sequences.
David B. Sauer, Da-Neng Wang.
Preprint, June 2019.
[10.1101/677328] [bioRxiv]

SolXplain: An Explainable Sequence-Based Protein Solubility Predictor.
Raghvendra Mall.
Preprint, May 2019.
[10.1101/651067] [bioRxiv]

High precision protein functional site detection using 3D convolutional neural networks.
Wen Torng, Russ B Altman.
Bioinformatics, May 2019.
[10.1093/bioinformatics/bty813]

Develop machine learning-based regression predictive models for engineering protein solubility.
Xi Han, Xiaonan Wang, Kang Zhou.
Bioinformatics, April 2019.
[10.1093/bioinformatics/btz294]

DeepCrystal: a deep learning framework for sequence-based protein crystallization prediction.
Abdurrahman Elbasir, Balasubramanian Moovarkumudalvan, Khalid Kunji, Prasanna R Kolatkar, Raghvendra Mall, Halima Bensmail.
Bioinformatics, November 2018.
[10.1093/bioinformatics/bty953]

DeepSol: a deep learning framework for sequence-based protein solubility prediction.
Sameer Khurana, Reda Rawi, Khalid Kunji, Gwo-Yu Chuang, Halima Bensmail, Raghvendra Mall.
Bioinformatics, March 2018.
[10.1093/bioinformatics/bty166]

A statistical model for improved membrane protein expression using sequence-derived features.
Shyam M. Saladi, Nauman Javed, Axel Müller, William M. Clemons, Jr.
Journal of Biological Chemistry, March 2018.
[10.1074/jbc.RA117.001052]

Learning epistatic interactions from sequence-activity data to predict enantioselectivity.
Julian Zaugg, Yosephine Gumulya, Alpeshkumar K. Malde, Mikael Bodén.
Journal of Computer Aided Molecular Design, December 2017.
[10.1007/s10822-017-0090-x]

Quantitative Missense Variant Effect Prediction Using Large-Scale Mutagenesis Data.
Vanessa E. Gray, Ronald J. Hause, Jens Luebeck, Jay Shendure, Douglas M. Fowler.
Cell Systems, December 2017.
[10.1016/j.cels.2017.11.003]

DeepLoc: prediction of protein subcellular localization using deep learning.
Jose Juan Almagro Armenteros, Casper Kaae Sønderby, Søren Kaae Sønderby, Henrik Nielsen, Ole Winther.
Bioinformatics, September 2017.
[10.1093/bioinformatics/btx548]

Semisupervised Gaussian Process for Automated Enzyme Search.
Joseph Mellor, Ioana Grigoras, Pablo Carbonell, and Jean-Loup Faulon.
ACS Synthetic Biology, March 2016.
[10.1021/acssynbio.5b00294]

High Precision Prediction of Functional Sites in Protein Structures.
Ljubomir Buturovic, Mike Wong, Grace W. Tang, Russ B. Altman, Dragutin Petkovic.
PLOS One, March 2014.
[10.1371/journal.pone.0091240]

Sequence Motifs in MADS Transcription Factors Responsible for Specificity and Diversification of Protein-Protein Interaction.
Aalt D. J. van Dijk, Giuseppa Morabito, Martijn Fiers, Roeland C. H. J. van Ham, Gerco C. Angenent, Richard G. H. Immink.
PLOS Computational Biology, November 2010.
[10.1371/journal.pcbi.1001017]

Predicting and understanding transcription factor interactions based on sequence level determinants of combinatorial control.
A.D.J. van Dijk, C.J.F. ter Braak, R.G. Immink, G.C. Angenent, R.C.H.J. van Ham.
Bioinformatics, January 2008.
[10.1093/bioinformatics/btm539]

Deep convolutional networks for quality assessment of protein folds.
Georgy Derevyanko, Sergei Grudinin, Yoshua Bengio, Guillaume Lamoureux.
Bioinformatics, December 2018.
[10.1093/bioinformatics/bty494][ArXiv] chr

About

Listing of papers about machine learning for proteins.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published