📃 Paper • 🖼 Dataset • 🤗 HF Repo
Compress & Align: Curating Image-Text Data with Human Knowledge
Compress-Align is the first general-purpose image-to-text human preference reward model, which is trained on in total 10k pairs of expert comparisons, eclipsing prevailing image-text scoring methods, such as CLIP-Score (by 30.3%) and BLIP-Score (by 33.5%), capturing the nuanced essence of human preference on image-text alignment.
If you find Compress-Align
's open-source effort useful, please 🌟 us to encourage our following development!
[2024.3.22] The code and data will be coming soon.
We are also very grateful that this work is supported by a gift from TPU Research Cloud (TRC) program and Google Cloud Research Credits program.
@article{zhang2023compress,
title={Compress & Align: Curating Image-Text Data with Human Knowledge},
author={Lei Zhang and Fangxun Shu and Sucheng Ren and Bingchen Zhao and Hao Jiang and Cihang Xie},
journal={arXiv preprint arXiv:2312.06726},
year={2023}
}