Preprint | Download Model | Blog | Cite
Abstract: The field of computational pathology has been transformed with recent advances in foundation models that encode histopathology region-of-interests (ROIs) into versatile and transferable feature representations via self-supervised learning (SSL). However, translating these advancements to address complex clinical challenges at the patient and slide level remains constrained by limited clinical data in disease-specific cohorts, especially for rare clinical conditions. We propose TITAN, a multimodal whole slide foundation model pretrained using 335,645 WSIs via visual self-supervised learning and vision-language alignment with corresponding pathology reports and 423,122 synthetic captions generated from a multimodal generative AI copilot for pathology. Without any finetuning or requiring clinical labels, TITAN can extract general-purpose slide representations and generate pathology reports that generalize to resource-limited clinical scenarios such as rare disease retrieval and cancer prognosis. We evaluate TITAN on diverse clinical tasks and find that TITAN outperforms both ROI and slide foundation models across machine learning settings such as linear probing, few-shot and zero-shot classification, rare cancer retrieval and cross-modal retrieval, and pathology report generation.
TITAN (Transformer-based pathology Image and Text Alignment Network) is a multimodal whole-slide foundation model pre-trained using visual self-supervised learning and vision-language alignment. It leverages 335,645 whole-slide images (WSIs) from a diverse set of internally collected neoplastic, infectious, and inflammatory cases at Mass General Brigham. Additionally, TITAN utilizes over 182,000 pathology reports and more than 423,000 synthetic captions generated by PathChat, our pathology co-pilot. TITAN's slide embeddings achieve state-of-the-art performance on diverse downstream tasks, including linear probing, few-shot and zero-shot classification, rare cancer retrieval, cross-modal retrieval, and pathology report generation.
- Why use TITAN?: Compared to other slide foundation models that rely on either one of vision-only pretraining or vision-language alignment, TITAN combined both strategies to ensure the slide representations contain rich and comprehensive morphological semantics. TITAN also did not use large public histology slide collections such as TCGA, PAIP, CPTAC, PANDA for pretraining, which are routinely used in benchmark development in computational pathology. Therefore, we make TITAN available for the research community in building and evaluating pathology AI models with minimal risk of data contamination on public benchmarks or private histopathology slide collections.
- 12/04/2024: CONCHv1.5 feature extraction is integrated into CLAM.
- 12/02/2024: TITAN preprint and model weights (TITAN-preview and CONCHv1.5) are now live. TCGA-OT splits are available in
./datasets
.
First clone the repo and cd into the directory:
git clone https://github.com/mahmoodlab/TITAN.git
cd TITAN
Then create a conda env and install the dependencies:
conda create -n titan python=3.9 -y
conda activate titan
pip install --upgrade pip
pip install -e .
Request access to the model weights (CONCHv1.5 and TITAN-preview for patch and slide feature extraction, respectively) from the Huggingface model page here.
Following authentication (using huggingface_hub), both TITAN-preview (slide and language encoders) and CONCH v1.5 (patch encoder) can be automatically downloaded from huggingface model hub as follows. It includes the functionalities to extract slide embeddings from patch embeddings and to perform zero-shot classification. More details can be found in our demo notebooks.
from huggingface_hub import login
from transformers import AutoModel
login() # login with your User Access Token, found at https://huggingface.co/settings/tokens
titan = AutoModel.from_pretrained('MahmoodLab/TITAN', trust_remote_code=True)
conch, eval_transform = titan.return_conch()
You can directly use TITAN-preview for slide-level feature extraction. TITAN builds a feature grids from CONCH v1.5 patch features using the coordinates and the distance between the patches. As patch coordinates are always saved at the slides' level 0 magnification, TITAN takes patch_size_lv0 which represents the distance between two adjacent patches at level 0 magnification. It is 1024 if slide is 40x, or 512 if slide is 20x. We have this info saved in our demo TCGA features.
Patch feature extraction CLAM can also be used for patch feature extraction with CONCHv1.5. When using extract_features_fp.py
, set --model_name
to 'conch_v1_5'.
Slide feature extraction Slide-level feature extraction can be done in the following way:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)
# load TCGA sample data
from huggingface_hub import hf_hub_download
demo_h5_path = hf_hub_download(
"MahmoodLab/TITAN",
filename="TCGA_demo_features/TCGA-PC-A5DK-01Z-00-DX1.C2D3BC09-411F-46CF-811B-FDBA7C2A295B.h5",
)
file = h5py.File(demo_h5_path, 'r')
features = torch.from_numpy(file['features'][:])
coords = torch.from_numpy(file['coords'][:])
patch_size_lv0 = file['coords'].attrs['patch_size_level0']
# extract slide embedding
with torch.autocast('cuda', torch.float16), torch.inference_mode():
features = features.to(device)
coords = coords.to(device)
slide_embedding = model.encode_slide_from_patch_features(features, coords, patch_size_lv0)
Note: while the original TITAN model arechitecture also includes a multimodal decoder trained with the captioning loss of CoCa, as additional precaution to ensure that no proprietary data or Protected Health Information (PHI) is leaked untentionally, we have removed the weights for the decoder from the publicly released TITAN weights. The weights for the text encoder and the vision encoder are intact and therefore the results on all key tasks presented in the paper are not affected. The ability of TITAN to serve as a general purpose encoder for both histopathology images and pathology-related text also remains unaffected.
We provide a set of demo notebooks to showcase the capabilities of TITAN. The notebooks include:
- Slide embedding extraction from patch embeddings in
notebooks/inference_demo.ipynb
. - Zero-shot classification on a single slide and on the TCGA-OT dataset in
notebooks/zeroshot_demo.ipynb
. - Linear Probing evaluation of the slide embeddings of the TCGA-OT dataset in
notebooks/linear_probe_demo.ipynb
.
We provide benchmark numbers on a set of representative tasks. A comprehensive set of benchmarks are in the paper. The results are with TITAN-preview model and will be updated accordingly with newer iterations of TITAN. For morphological classification, the results are reported using linear probe. For slide retrieval, the results are reported using Accuracy @K (At least one of Top-K retrieved slides shares the same diagnostic label as the query) and MVAccuracy @K (The majority vote of Top-K retrieved slides is the same diagnostic label as the query).
We released all TCGA TITAN-preview features, which can be loaded by
import pickle
from huggingface_hub import hf_hub_download
slide_feature_path = hf_hub_download(
"MahmoodLab/TITAN",
filename="TCGA_TITAN_features.pkl",
)
with open(slide_feature_path, 'rb') as file:
data = pickle.load(file)
-
TCGA-UT-8K is a ROI dataset (8,192 x 8,192 pixels) was curated in consultation with the original TCGA-UT authors and will be released in the coming weeks.
-
TCGA-OT is a slide-level 46-class classification task with 46 classes, according to the OncoTere classification system such that every class is represented by at least 50 samples. It consists of 11,186 formalin-fixed paraffin-embedded (FFPE) WSIs from TCGA and is the largest pan-cancer slide-level classification task publicly available. The splits are released in
./datasets
.
Task | TITAN [1] | PRISM [2] | Prov-GigaPath [3] | CHIEF [4] | |
---|---|---|---|---|---|
Patch encoder | CONCHv1.5 | Virchow | Prov-GigaPath | CTransPath | |
TCGA-UT-8K (32 classes, Public) |
Bal. acc. | 0.832 | 0.774 | 0.700 | 0.625 |
TCGA-OT (46 classes, Public) |
Bal. acc. | 0.704 | 0.643 | 0.543 | 0.528 |
OT-108 (108 classes, Internal) |
Bal. acc. | 0.587 | 0.508 | 0.437 | 0.413 |
EBRAINS (30 classes, Public) |
Bal. acc. | 0.735 | 0.674 | 0.680 | 0.598 |
Renal allograft AMR (2 classes, internal) |
AUROC | 0.915 | 0.820 | 0.836 | 0.813 |
Task | TITAN [1] | PRISM [2] | Prov-GigaPath [3] | CHIEF [4] | |
---|---|---|---|---|---|
Patch encoder | CONCHv1.5 | Virchow | Prov-GigaPath | CTransPath | |
TCGA-UT-8K (32 classes, Public) |
Acc. @3 MVacc. @3 |
0.912 0.875 |
0.854 0.788 |
0.728 0.645 |
0.690 0.609 |
TCGA-OT (46 classes, Public) |
Acc. @3 MVacc. @3. |
0.880 0.807 |
0.836 0.755 |
0.666 0.572 |
0.669 0.602 |
OT-108 (108 classes, Internal) |
Acc. @3 MVacc. @3 |
0.707 0.621 |
0.636 0.547 |
0.450 0.414 |
0.442 0.400 |
EBRAINS (30 classes, Public) |
Acc. @3 MVacc. @3 |
0.865 0.809 |
0.811 0.751 |
0.806 0.733 |
0.713 0.631 |
Renal allograft AMR (2 classes, internal) |
Acc. @3 MVacc. @3 |
0.919 0.785 |
0.887 0.666 |
0.857 0.630 |
0.848 0.646 |
TITAN-preview shows just a glimpse of the envisioned final TITAN model, as the model can be easily scaled. More WSIs are being digitized and the synthetic caption generation with the multimodal copilot is basically unlimited, all of which can be incorporated into TITAN's pretraining pipeline. Stay tuned for more updates!
ⓒ Mahmood Lab. This model and associated code are released under the CC-BY-NC-ND 4.0 license and may only be used for non-commercial, academic research purposes with proper attribution. Any commercial use, sale, or other monetization of the TITAN model and its derivatives, which include models trained on outputs from the TITAN model or datasets created from the TITAN model, is prohibited and requires prior approval. Downloading the model requires prior registration on Hugging Face and agreeing to the terms of use. By downloading this model, you agree not to distribute, publish or reproduce a copy of the model. If another user within your organization wishes to use the TITAN model, they must register as an individual user and agree to comply with the terms of use. Users may not attempt to re-identify the deidentified data used to develop the underlying model. If you are a commercial entity, please contact the corresponding author or Mass General Brigham Innovation Office.
The project was built on top of amazing repositories such as ViT, iBOT, OpenClip, LGSSL, and Timm (ViT model implementation). We thank the authors and developers for their contribution.
If you find our work useful in your research or if you use parts of this code please consider citing our paper:
Ding, T.*, Wagner S.J.*, Song, A.H.*, Chen, R.J.* et al. Multimodal Whole Slide Foundation Model for Pathology, Arxiv, 2024
@misc{ding2024titan,
title={Multimodal Whole Slide Foundation Model for Pathology},
author={Tong Ding and Sophia J. Wagner and Andrew H. Song and Richard J. Chen and Ming Y. Lu and Andrew Zhang and Anurag J. Vaidya and Guillaume Jaume and Muhammad Shaban and Ahrong Kim and Drew F. K. Williamson and Bowen Chen and Cristina Almagro-Perez and Paul Doucet and Sharifa Sahai and Chengkuan Chen and Daisuke Komura and Akihiro Kawabe and Shumpei Ishikawa and Georg Gerber and Tingying Peng and Long Phi Le and Faisal Mahmood},
year={2024},
eprint={2411.19666},
archivePrefix={arXiv},
primaryClass={eess.IV},
url={https://arxiv.org/abs/2411.19666},
}