[EMNLP'24] RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models

We tackle the challenge of improving factual accuracy in Medical Large Vision Language Models (Med-LVLMs) using our novel approach, RULE. Despite their promise, Med-LVLMs often generate responses misaligned with established medical facts. RULE addresses this with two key strategies: 1) Calibrated selection of retrieved contexts to control factuality risk. 2) Fine-tuning models using a preference dataset to balance reliance on inherent knowledge and retrieved contexts. Our method achieves a 20.8% improvement in factual accuracy across three medical VQA datasets.

RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models [Paper]

🌟 Requirements

Clone this repository and navigate to RULE folder

git clone https://github.com/richard-peng-xia/RULE.git
cd RULE

Install Package: Create conda environment

conda create -n RULE python=3.10 -y
conda activate RULE
pip install --upgrade pip  # enable PEP 660 support
pip install -e .
pip install trl

Download the required model checkpoints LLaVA-Med-1.5 from huggingface.
For all the medical datasets, you need firstly apply for the right of access and then download the dataset.

MIMIC-CXR
IU-Xray (Thanks to R2GenGPT for sharing the file)
Harvard-FairVLMed

📖 Data Description

We provide a corresponding json or jsonl file for each dataset, including the image path, question, answer, and original report.

Training: The data used to train the retriever and fine-tune the Med-LVLM are located in data/training/retriever and data/training/alignment respectively.
Test: All the test data for Med-LVLMs is placed under data/test.

🚀 Training

The training code of Direct Preference Optimization is at llava/train/train_dpo.py.
The relevant script can be found at scripts/run_dpo.sh

🥖 Retrieve

Use retrieve_clip.sh to retrieve reports for VQA task. The script uses Harvard-FairVLMed dataset as an example. Make sure to specify the necessary data paths and the saving location.

🛠️ Inference

For test dataset inference, you need to specify the following arguments.

python llava/eval/model_vqa_{dataset}.py \
    --model-base 'path/to/llava-med-1.5' \
    --model-path 'path/to/lora_weights' \
    --question-file 'path/to/question_file.json' \
    --image-folder 'path/to/test_images' \
    --answers-file 'path/to/output_file.json'

The written script is at scripts/inference.sh. Before that, you need to set the correct path of data and checkpoints in your script.

📚 Citation

@article{xia2024rule,
  title={RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models},
  author={Xia, Peng and Zhu, Kangyu and Li, Haoran and Zhu, Hongtu and Li, Yun and Li, Gang and Zhang, Linjun and Yao, Huaxiu},
  journal={arXiv preprint arXiv:2407.05131},
  year={2024}
}

🙏 Acknowledgement

We use code from LLaVA-Med, POVID, CARES. We thank the authors for releasing their code.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
asset		asset
data		data
llava		llava
retrieve		retrieve
scripts		scripts
tool		tool
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cog.yaml		cog.yaml
povid_infer.py		povid_infer.py
predict.py		predict.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[EMNLP'24] RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models

🌟 Requirements

📖 Data Description

🚀 Training

🥖 Retrieve

🛠️ Inference

📚 Citation

🙏 Acknowledgement

About

Releases

Packages

Contributors 2

Languages

License

richard-peng-xia/RULE

Folders and files

Latest commit

History

Repository files navigation

[EMNLP'24] RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models

🌟 Requirements

📖 Data Description

🚀 Training

🥖 Retrieve

🛠️ Inference

📚 Citation

🙏 Acknowledgement

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages