This is the code repository for the paper: In Progress
We developed a web system application for users to test our proposed pipilne for predicting histopathology reports. Users can interact with the platform in 2 ways: 1) Input an excel/csv spreadsheet with a column with the biopsy diagnosis (Part A,B or C). 2) Input a single biopsy diagnosis. An example of our Web System is illustraded bellow:
We recommend using a virtual environment
If you do not already have conda
installed, you can install Miniconda from this link (~450Mb). Then, check that conda is up to date:
conda update -n base -c defaults conda
And create a conda environment from the yml file:
conda env create -f environment.yml
If not already activated, activate the new conda environment using:
conda activate pathology
wget https://raw.githubusercontent.com/chenrui333/homebrew-core/0094d1513ce9e2e85e07443b8b5930ad298aad91/Formula/libomp.rb
brew unlink libomp
brew install --build-from-source ./libomp.rb
brew list --version libomp
Script Example to Download Models
python3 app/src/download_models.py --all_labels "single_tfidf" --higher_order "PathologyEmoryPubMedBERT"
There are several models available for download
Higher Order Option | All Labels Options |
---|---|
PathologyEmoryPubMedBERT | single_tfidf |
PathologyEmoryBERT | branch_tfidf |
ClinicalBERT | |
BlueBERT | |
BioBERT | |
BERT |
A minimal demo app is provided for you to play with the classification model!
You can easily run your app in your default browser by running:
python3.8 -m streamlit.cli run app/src/app.py
OR
streamlit run app/src/app.py
You can also use our api to run using terminal.
The program takes an excell/csv sheet and extract the higher order and cancer characteristics from pathology reports
- Input Options:
- path_to_file - Path to an excel/csv with pathology diagnosis: String (Required).
- column_name - Which column has the pathology diagnosis: String (Required).
- higher_model - Which version of higher order model to use: String (Required).
- all_label_model - Which version of all labels model to use: String (Required).
- save_predictions - Path to save output: String (Optional).
- output_model_data - Option to output model data to csv True/False (Optional).
- save_input - Option to output the input fields True/False (Optional).
- save_json - Path to save json analyis: String (Optional).
Example of Runing:
python3 app/src/label_extraction.py --path_to_file data.xlsx --column_name report --higher_model "PathologyEmoryPubMedBERT" --all_label_model "single_tfidf" --save_predictions predictions.xlsx --save_json output.json
Coming soon
A minimal annotation tool is provided
Annotation Tool Repository : GitHub Link
Ms. Thiago Santos
Dr. Imon Banerjee
Dr. Hari Trivedi
Dr. Judy Wawira