Skip to content

HCSBC: Hierarchical Classification System for Breast Cancer Specimen Report

Notifications You must be signed in to change notification settings

thiagosantos1/HCSBC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

HCSBC: Hierarchical Classification System for Breast Cancer Specimen Report

Publication

This is the code repository for the paper: In Progress

Web System Application

We developed a web system application for users to test our proposed pipilne for predicting histopathology reports. Users can interact with the platform in 2 ways: 1) Input an excel/csv spreadsheet with a column with the biopsy diagnosis (Part A,B or C). 2) Input a single biopsy diagnosis. An example of our Web System is illustraded bellow:

Model and Project Design

Installation

We recommend using a virtual environment

If you do not already have conda installed, you can install Miniconda from this link (~450Mb). Then, check that conda is up to date:

conda update -n base -c defaults conda

And create a conda environment from the yml file:

conda env create -f environment.yml

If not already activated, activate the new conda environment using:

conda activate pathology

Mac Users - Install libomp 11.0

wget https://raw.githubusercontent.com/chenrui333/homebrew-core/0094d1513ce9e2e85e07443b8b5930ad298aad91/Formula/libomp.rb
brew unlink libomp
brew install --build-from-source ./libomp.rb

Check that version 11.1 is installed

brew list --version libomp

Download Models

Script Example to Download Models

python3 app/src/download_models.py --all_labels "single_tfidf" --higher_order "PathologyEmoryPubMedBERT"

There are several models available for download

Higher Order Option All Labels Options
PathologyEmoryPubMedBERT single_tfidf
PathologyEmoryBERT branch_tfidf
ClinicalBERT
BlueBERT
BioBERT
BERT

Demo app

A minimal demo app is provided for you to play with the classification model!

You can easily run your app in your default browser by running:

python3.8 -m streamlit.cli run app/src/app.py

OR

streamlit run app/src/app.py

Extract Using Terminal

You can also use our api to run using terminal.

The program takes an excell/csv sheet and extract the higher order and cancer characteristics from pathology reports

  • Input Options:
    • path_to_file - Path to an excel/csv with pathology diagnosis: String (Required).
    • column_name - Which column has the pathology diagnosis: String (Required).
    • higher_model - Which version of higher order model to use: String (Required).
    • all_label_model - Which version of all labels model to use: String (Required).
    • save_predictions - Path to save output: String (Optional).
    • output_model_data - Option to output model data to csv True/False (Optional).
    • save_input - Option to output the input fields True/False (Optional).
    • save_json - Path to save json analyis: String (Optional).

Example of Runing:

python3 app/src/label_extraction.py --path_to_file data.xlsx --column_name report --higher_model "PathologyEmoryPubMedBERT" --all_label_model "single_tfidf" --save_predictions predictions.xlsx --save_json output.json

Using Docker

Coming soon

Annotation Tool

A minimal annotation tool is provided

Alt text

Annotation Tool Repository : GitHub Link

Contributors

Ms. Thiago Santos

Dr. Imon Banerjee

Dr. Hari Trivedi

Dr. Judy Wawira

About

HCSBC: Hierarchical Classification System for Breast Cancer Specimen Report

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published