Models 03-2023
All models were trained using the following dataset settings from aposteriori
poetry run make-frame-dataset /scratch/datasets/biounit/ -d benchmarking_set.csv -e .pdb1.gz --voxels-per-side 21 --frame-edge-length 21 -g True -p 35 -n benchmark_set -v -r -z -cb True -ae CNOCBCA --compression_gzip True -o /scratch/timed_dataset/
We retrained all models with the same dataset and tested on the PDBench benchmark.
Sequence Metrics
Accuracy
Macro-Recall
Macro-Recall is accuracy averaged per residue - resistant to class imbalance.
Charge Mean Absolute Error (MAE)
Difference between the charge of the original sequence and the predicted sequence.
Isoelectric Point Mean Absolute Error (MAE)
Difference between the isoelectric point of the original sequence and the predicted sequence.
3D Structure Metrics
RMSD
We sampled 10% of the dataset and ran it through AlphaFold2 + Amber relaxation