31 Jan 15:20

05d08a1

Models 01-2024 Latest

Latest

Models for the paper "TIMED-Design: Flexible and Accessible Protein Sequence Design with Convolutional Neural Networks"

Performance Comparison

For detailed performance comparisons, please see the paper.

Macro Recall

Macro-Recall is accuracy averaged per residue - resistant to class imbalance.

RMSD_100

We sampled 10% of the PDBench dataset and ran it through AlphaFold2 + Amber relaxation. RMSD_100 is a normalised version of RMSD.

Isoelectric Point Mean Absolute Error (MAE)

Difference between the isoelectric point of the original sequence and the predicted sequence.

Charge Mean Absolute Error (MAE)

Difference between the overall charge of the original sequence and the predicted sequence.

Training

All models were trained using the culled PDB set from PISCES cullpdb_pc90_res3.0_R1.0_d200702_chains40583containing over 35K non-redundant protein structures (40K+ chains), with resolutions up to 3.0 Å.

CNN Models

We reimplemented all of the CNN models in the literature as they were all closed-source. The dataset for CNN models was created using aposteriori using the following command:

make-frame-dataset /scratch/datasets/biounit/ -d benchmarking_set.csv -e .pdb1.gz --voxels-per-side 21 --frame-edge-length 21 -g True -p 35 -n benchmark_set -v -r -z -cb True -ae CNOCBCA  --compression_gzip True -o /scratch/timed_dataset/

For Charge and Polar models we used the codecs (-ae) equivalent to CNOCBCAQ and CNOCBCAP, respectively.

GNN Models

The code for training ProteinMPNN with custom training sets is not available. We recreated the steps given to us by the authors and published them here: https://github.com/wells-wood-research/ProteinMPNN_custom_training/tree/main

What's Changed

Add output_dir as functionality by @universvm in #66
Fix .fasta files output by @LunaPrau in #68
Simplify Install by @universvm in #62
Fix security vulnerabilities by @universvm in #69
Hide streamlit warnings by @universvm in #71
Fix docker by @universvm in #70
Hide charge and polar until #64 is merged by @universvm in #73
Add page title. by @ChrisWellsWood in #75

New Contributors

@LunaPrau made their first contribution in #68
@ChrisWellsWood made their first contribution in #75

Full Changelog: modelspublication...publication_01_2024

Full Changelog: publication_01_2024...publication_01_2024

Contributors

ChrisWellsWood, universvm, and LunaPrau

Assets 8

14 Dec 13:54

universvm

modelspublication

abc6afa

Models 12-2023

Models for the paper "TIMED-Design: Flexible and Accessible Protein Sequence Design with Convolutional Neural Networks"

Performance Comparison

For detailed performance comparisons, please see the paper.

Macro Recall

Macro-Recall is accuracy averaged per residue - resistant to class imbalance.

RMSD_100

We sampled 10% of the PDBench dataset and ran it through AlphaFold2 + Amber relaxation. RMSD_100 is a normalised version of RMSD.

Isoelectric Point Mean Absolute Error (MAE)

Difference between the isoelectric point of the original sequence and the predicted sequence.

Charge Mean Absolute Error (MAE)

Difference between the overall charge of the original sequence and the predicted sequence.

Training

CNN Models

We reimplemented all of the CNN models in the literature as they were all closed-source. The dataset for CNN models was created using aposteriori using the following command:

poetry run make-frame-dataset /scratch/datasets/biounit/ -d benchmarking_set.csv -e .pdb1.gz --voxels-per-side 21 --frame-edge-length 21 -g True -p 35 -n benchmark_set -v -r -z -cb True -ae CNOCBCA  --compression_gzip True -o /scratch/timed_dataset/

For Charge and Polar models we used the codecs (-ae) equivalent to CNOCBCAQ and CNOCBCAP, respectively.

GNN Models

Assets 7

23 Mar 17:24

universvm

model0323

574d2a3

Models 03-2023

All models were trained using the following dataset settings from aposteriori

poetry run make-frame-dataset /scratch/datasets/biounit/ -d benchmarking_set.csv -e .pdb1.gz --voxels-per-side 21 --frame-edge-length 21 -g True -p 35 -n benchmark_set -v -r -z -cb True -ae CNOCBCA --compression_gzip True -o /scratch/timed_dataset/

We retrained all models with the same dataset and tested on the PDBench benchmark.

Sequence Metrics

Accuracy

Macro-Recall

Macro-Recall is accuracy averaged per residue - resistant to class imbalance.

Charge Mean Absolute Error (MAE)

Difference between the charge of the original sequence and the predicted sequence.

Isoelectric Point Mean Absolute Error (MAE)

Difference between the isoelectric point of the original sequence and the predicted sequence.

3D Structure Metrics

RMSD

We sampled 10% of the dataset and ran it through AlphaFold2 + Amber relaxation

Assets 11

16 Aug 08:49

universvm

v0.1-alpha

a94468e

TIMED-design pre-release Pre-release

Pre-release

All models except timed use the following:

poetry run make-frame-dataset /scratch/datasets/biounit/ -d benchmarking_set.csv -e .pdb1.gz --voxels-per-side 21 --frame-edge-length 21 -g True -p 35 -n benchmark_set -v -r -z -cb True -ae CNOCBCA  --compression_gzip True -o /scratch/timed_dataset/

Assets 6

07 Apr 07:38

universvm

model

8727f2a

Models 2022-04

All models except timed use the following:

poetry run make-frame-dataset /scratch/datasets/biounit/ -d benchmarking_set.csv -e .pdb1.gz --voxels-per-side 21 --frame-edge-length 21 -g True -p 35 -n benchmark_set -v -r -z -cb True -ae CNOCBCA  --compression_gzip True -o /scratch/timed_dataset/

TIMED uses the following settings:

poetry run make-frame-dataset ../../shared/datasets/biounit/ -e .pdb1.gz --pieces-filter-file /home/shared/datasets/pisces/cullpdb_pc90_res3.0_R1.0_d200702_chains40583 --voxels-per-side 21 --frame-edge-length 13 -g False -p 35 -n pisces_expanded -v -r -z -cb True -ae CNOCBCA -b blacklist.csv

Assets 6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance Comparison

Macro Recall

RMSD_100

Isoelectric Point Mean Absolute Error (MAE)

Charge Mean Absolute Error (MAE)

Training

CNN Models

GNN Models

What's Changed

New Contributors

Contributors

Performance Comparison

Macro Recall

RMSD_100

Isoelectric Point Mean Absolute Error (MAE)

Charge Mean Absolute Error (MAE)

Training

CNN Models

GNN Models

Sequence Metrics

Accuracy

Macro-Recall

Charge Mean Absolute Error (MAE)

Isoelectric Point Mean Absolute Error (MAE)

3D Structure Metrics

RMSD

Releases: wells-wood-research/timed-design

Models 01-2024

Performance Comparison

Macro Recall

RMSD_100

Isoelectric Point Mean Absolute Error (MAE)

Charge Mean Absolute Error (MAE)

Training

CNN Models

GNN Models

What's Changed

New Contributors

Contributors

Models 12-2023

Performance Comparison

Macro Recall

RMSD_100

Isoelectric Point Mean Absolute Error (MAE)

Charge Mean Absolute Error (MAE)

Training

CNN Models

GNN Models

Models 03-2023

Sequence Metrics

Accuracy

Macro-Recall

Charge Mean Absolute Error (MAE)

Isoelectric Point Mean Absolute Error (MAE)

3D Structure Metrics

RMSD

TIMED-design pre-release

Models 2022-04