GitHub - GosUxD/OpenChemFold: 3rd Place Solution for the Stanford Ribonanza RNA Folding Competition

Hello!

Below you can find a outline of how to reproduce 3rd solution for the Stanford Ribonanza RNA Folding competition. If you run into any trouble with the setup/code or have any questions please contact me at [email protected]

Below is the architectures of the 2 models used for the final submission

Twin Tower Model

Squeezeformer Model

DATA SETUP

Data should be downloaded from Kaggle competition website and placed under /datamount/. train_data.csv is then preprocessed with preprocess_dataset_to_parquet.py to create folded parquet file.

Optionally: preprocessed training_data.parquet and synthetic data used for final submission can be downloaded from gdrive for quick start in training.

Since BPP's take a lot of disk, they can only be downloaded from competitions website. Place them under datamount/supp_data and preprocessed with preprocess_bpps.py script to create a bpps_index.csv and .npz files that are being used in the training of the model.

Simple Squeezeformer training

python train.py -C cfg_1

EXAMPLE: change train configuration, use 1st GPU, disable neptune experiment tracking:

python train.py -C cfg_1 -G 0 -batch_size 256 -lr 7e-4 -logging False

Simple twin-tower model training for 2 GPU setup

python train_ddp -C cfg_2

EXAMPLE: Change train configuration, add comment for neptune experiment tracking

python train_ddp.py -C cfg_2 -lr 1.5e-3 -epochs 50 -comment "Neptune comment for experiment"

Inference: Just run the script for the model you want to inference

python inference_mdl_1_squeezeformer.py

Recreating 3rd place submission:

First twin-tower model is trained with the current configuration (cfg_2). Then with the twin-tower models weights synthetic data is created by running python generate_synthetic.py. Squeezeformer is then trained on both the clean dataset and synthetic dataset with the current configuration (cfg_1).

Simple blend from both models (0.5 weight each) is used for the final submission. Pretrained weights of the final submission models are inside datamount/weights

HARDWARE: (The following specs were used to create the original solution)

Ubuntu 22.04.3 LTS
CPU: i7-13700K (24 vCPUs)
2 x NVIDIA RTX 4090 (24GB each)
96GB RAM (2x32 GB + 2x64 GB)
1TB SSD

SOFTWARE (python packages are detailed separately in `requirements.txt`):

python 3.11.5
CUDA 12.1
PyTorch 2.1.0

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
__pycache__		__pycache__
configs		configs
data		data
datamount		datamount
metrics		metrics
models		models
scripts		scripts
LICENSE		LICENSE
Readme.md		Readme.md
inference_mdl_1_squeezeformer.py		inference_mdl_1_squeezeformer.py
inference_mdl_2_twintower.py		inference_mdl_2_twintower.py
requirements.txt		requirements.txt
squeezeformer.jpg		squeezeformer.jpg
train.py		train.py
train_ddp.py		train_ddp.py
twin_tower.jpg		twin_tower.jpg
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Twin Tower Model

Squeezeformer Model

DATA SETUP

Simple Squeezeformer training

EXAMPLE: change train configuration, use 1st GPU, disable neptune experiment tracking:

Simple twin-tower model training for 2 GPU setup

EXAMPLE: Change train configuration, add comment for neptune experiment tracking

Inference: Just run the script for the model you want to inference

Recreating 3rd place submission:

HARDWARE: (The following specs were used to create the original solution)

SOFTWARE (python packages are detailed separately in `requirements.txt`):

About

Releases

Packages

Languages

License

GosUxD/OpenChemFold

Folders and files

Latest commit

History

Repository files navigation

Twin Tower Model

Squeezeformer Model

DATA SETUP

Simple Squeezeformer training

EXAMPLE: change train configuration, use 1st GPU, disable neptune experiment tracking:

Simple twin-tower model training for 2 GPU setup

EXAMPLE: Change train configuration, add comment for neptune experiment tracking

Inference: Just run the script for the model you want to inference

Recreating 3rd place submission:

HARDWARE: (The following specs were used to create the original solution)

SOFTWARE (python packages are detailed separately in requirements.txt):

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

SOFTWARE (python packages are detailed separately in `requirements.txt`):

Packages