Cleaned Python code for the George B. Moody PhysioNet Challenge 2024

What's in this repository?

This repository contains a cleaned version of our team's code for the Python entry to the George B. Moody PhysioNet Challenge 2024. You can use it to digitize ECG images. It contains two main modules:

Training the segmentation model.
Running the digitization pipeline.

What is contained?

ecg-image-generator: Code to generate synthetic ECG images from the PTB-XL dataset.
nnUNet: Code of the segmentation model nnU-Net.
src/run/run.py: Code to run the digitization pipeline.
models: Trained models.
config.py: Configuration file.
src/ptb_xl: Code to prepare the PTB-XL dataset.
src/utils: Utility functions.

How do I get started?

Clone this repository git clone https://github.com/felixkrones/physionet24.git and use the branch cleanup: git checkout cleanup

If you want to get the pre-trained weights, before you clone it, you need to activate lfs: git lfs install

If it does not download the weights automatically, you can then use git lfs pull
Before you stage via git, make sure that you define which files should be handled via lfs: git lfs track "*.pth"
Move into the repo: cd physionet24

Create a new environment:

Using pip:

 python3.11 -m venv .venv
 source .venv/bin/activate
 pip install --upgrade pip
 pip install -r requirements.txt

Using conda:

 conda create --name env-name python=3.11
 conda activate env-name
 pip install --upgrade pip
 pip install -r requirements.txt

At the moment, the official nnU-Net repository contains a bug and is not working with RGB png images. Please use the following for now:

 cd nnUNet
 pip install .
 cd ..

Prepare the data as described below under Data.
Run the code as decribed below under Run.

Data

To run the code, you need different kind of data. If you are using the PTB-XL dataset, see below under Using the PTB-XL dataset on how to prepare the data.

Training data for the segmentation model: In order to train the segmantation model, you need to have the data in the format as described in nnUNet/documentation/dataset_format.md. (For the following description, we assume the folder containing the data is called Dataset500_Signals.)
Data to run the digitization: A folder containing the ECG images to be digitized. Those images should match the ones used for training the segmentation model. If you are using our pre-trained weights, those should match the images from the 2024 Challenge.

Using the PTB-XL dataset

Download (and unzip) the PTB-XL dataset and PTB-XL+ dataset. Replace the name of the folder (probably ptb-xl-a-large-publicly-available-electrocardiography-dataset-1.0.3) that contains a file named ptbxl_database.csv with ptb-xl/. Replace the name of the folder (probably ptb-xl-a-comprehensive-electrocardiographic-feature-dataset-1.0.1) that contains a folder called labels with ptb-xl-p/. Replace the paths below with the actual paths to the folders.

Add information from various spreadsheets from the PTB-XL dataset to the WFDB header files:

 python -m src.ptb_xl.prepare_ptbxl_data \
     -i ptb-xl/records500 \
     -pd ptb-xl/ptbxl_database.csv \
     -pm ptb-xl/scp_statements.csv \
     -sd ptb-xl-p/labels/12sl_statements.csv \
     -sm ptb-xl-p/labels/mapping/12slv23ToSNOMED.csv \
     -o ptb-xl/records500_prepared

Generate synthetic ECG images on the dataset:
1. Move into ecg-image-generator: cd ecg-image-generator
2. Deactivate the current env (conda deactivate) and create a new one by following the instructions in the README of the generator repo (ecg-image-generator/README.md) (this means running conda env create -f environment_droplet.yml and then conda activate ecg_gen).
3. Now you can run the following code. Careful though, this can take very long, around 10 min per subfolder (approx. 1000 files) or 4h in total and will increase the necessary disk space by approx. 15x, adding another 8GB for the 500Hz data. To test, better to run it on one single subfolder (e.g., add /00000):
```
 python gen_ecg_images_from_data_batch.py \
     -i ptb-xl/records500_prepared \
     -o ptb-xl/records500_prepared_w_images \
     --print_header \
     --store_config 2 \
     --mask_unplotted_samples
```
  For example:
```
 python gen_ecg_images_from_data_batch.py \
     -i ptb-xl/records500_prepared/00000 \
     -o ptb-xl/records500_prepared_w_images/00000 \
     -se 10 \
     --mask_unplotted_samples \
     --print_header \
     --store_config 2 \
     --lead_name_bbox \
     --lead_bbox \
     --random_print_header 0.7 \
     --calibration_pulse 0.6 \
     --fully_random \
     -rot 5 \
     --num_images_per_ecg 4 \
     --random_bw 0.1 \
     --run_in_parallel \
     --num_workers 8
```
4. Deactivate the environment again (conda deactivate), move back to your original repo (cd ..) and activate the environment from above again (conda activate env-name).

Add the file locations and other information for the synthetic ECG images to the WFDB header files:

 python -m src.ptb_xl.prepare_image_data \
     -i ptb-xl/records500_prepared_w_images \
     -o ptb-xl/records500_prepared_w_images

Create more dense pixels for the masks:

 python -m src.ptb_xl.replot_pixels \
     --resample_factor 3 \
     --dir ptb-xl/records500_prepared_w_images \
     --run_on_subdirs \
     --num_workers 12

Convert it into the nnUNet format:

We use the suggested splits from ptbxl_database.csv:

 python -m src.ptb_xl.create_train_test \
     -i ptb-xl/records500_prepared_w_images \
     -d ptb-xl/ptbxl_database.csv \
     -o ptb-xl/Dataset500_Signals

For example:

 python -m src.ptb_xl.create_train_test \
     -i ptb-xl/records500_prepared_w_images \
     -d ptb-xl/ptbxl_database.csv \
     -o ptb-xl/Dataset500_Signals \
     --rgba_to_rgb \
     --gray_to_rgb \
     --mask \
     --mask_multilabel \
     --rotate_image \
     --plotted_pixels_key dense_plotted_pixels \
     --num_workers 8

Run

You can either use our model weights for the segmentation model or train your own model following the steps under 1. below.

1. Train the segmentation model

First, one needs to train the segmentation model. We use nnU-Net; this involves multiple steps:

Set the environment variables for nnU-Net (just post the following in the terminal from where you want to run the code):

 # Set environment variables
 export nnUNet_raw='path to the folder that contains the Dataset500_Signals folder'
 export nnUNet_preprocessed='any folder path to save the preprocessed data'
 export nnUNet_results='any folder path to save the results'

 # Check
 echo ${nnUNet_raw}

Experiment planning and preprocessing

 nnUNetv2_plan_and_preprocess -d DATASET_ID --verify_dataset_integrity

For example:

 nnUNetv2_plan_and_preprocess -d 500 --clean -c 2d --verify_dataset_integrity

Model training

 nnUNetv2_train DATASET_NAME_OR_ID UNET_CONFIGURATION FOLD

For example:

 nnUNetv2_train 500 2d 0 -device cuda --c

Or select the device (one per fold):

 CUDA_VISIBLE_DEVICES=1 nnUNetv2_train 500 2d 1 --c

Determine the best configuration

 nnUNetv2_find_best_configuration DATASET_NAME_OR_ID -c CONFIGURATIONS

For example:

 nnUNetv2_find_best_configuration 500 -c 2d --disable_ensembling

2. Run the digitization

Set the parameters in config.py.

Run the digitization pipeline by running:

 python -m src.run.digitize -d data_path -m model_path -o output_path -v

where

data_path (input; required) is the folder containing the images to be digitized;
model_path (input; required) is the folder containing the segmentation model folder nnUNet_results, e.g., models/M3;
output_path is the folder where the outputs will be saved.

How do I evaluate my model?

Checkout hte evaluation-2024 repository for code on how to evaluate your model.

Credits

Please see the Challenge website for more details.
This code builds on the PhysioNet 2024 repo and the ecg-image-kit. Please cite both if using this code.

We used nnU-Net, please also cite them:

  Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J., & Maier-Hein, K. H. (2021). nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods, 18(2), 203-211.

Last but not least, please cite us as well:

  @article{krones2024combining,
      title={Combining Hough Transform and Deep Learning Approaches to Reconstruct ECG Signals From Printouts},
      author={Krones, Felix and Walker, Ben and Lyons, Terry and Mahdi, Adam},
      journal={arXiv:2410.14185},
      year={2024}
  }

  Krones F, Walker B, Lyons T, Mahdi A. Combining Hough Transform and Deep Learning Approaches to Reconstruct ECG Signals From Printouts. arXiv:2410.14185. 2024 Oct 18.

Name		Name	Last commit message	Last commit date
Latest commit History 111 Commits
ecg-image-generator		ecg-image-generator
models/M3/nnUNet_results/Dataset500_Signals/nnUNetTrainer__nnUNetPlans__2d		models/M3/nnUNet_results/Dataset500_Signals/nnUNetTrainer__nnUNetPlans__2d
nnUNet		nnUNet
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.py		config.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cleaned Python code for the George B. Moody PhysioNet Challenge 2024

What's in this repository?

What is contained?

How do I get started?

Data

Using the PTB-XL dataset

Run

1. Train the segmentation model

2. Run the digitization

How do I evaluate my model?

Credits

About

Releases

Packages

Contributors 3

Languages

License

felixkrones/physionet24

Folders and files

Latest commit

History

Repository files navigation

Cleaned Python code for the George B. Moody PhysioNet Challenge 2024

What's in this repository?

What is contained?

How do I get started?

Data

Using the PTB-XL dataset

Run

1. Train the segmentation model

2. Run the digitization

How do I evaluate my model?

Credits

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages