Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md

Repository files navigation

Papers related to text image on NeurIPS 2023!

prompt

Improving CLIP Training with Language Rewrites

Enhancing CLIP with CLIP: Exploring Pseudolabeling for Limited-Label Prompt Tuning

Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization

UP-DP: Unsupervised Prompt Learning for Data Pre-Selection with Vision-Language Models

The CLIP Model is Secretly an Image-to-Prompt Converter

Optimizing Prompts for Text-to-Image Generation

Dynamic Prompt Learning: Addressing Cross-Attention Leakage for Text-Based Image Editing

Visual Instruction Inversion: Image Editing via Image Prompting

Tuning Multi-mode Token-level Prompt Alignment across Modalities

SwapPrompt: Test-Time Prompt Adaptation for Vision-Language Models

LoCoOp: Few-Shot Out-of-Distribution Detection via Prompt Learning

The Rise of AI Language Pathologists: Exploring Two-level Prompt Learning for Few-shot Weakly-supervised Whole Slide Image Classification

Fine-Grained Visual Prompting

adaptation

Benchmarking Robustness of Adaptation Methods on Pre-trained Vision-Language Models

CLIP4HOI: Towards Adapting CLIP for Practical Zero-Shot HOI Detection

Meta-Adapter: An Online Few-shot Learner for Vision-Language Model

GraphAdapter: Tuning Vision-Language Models With Dual Knowledge Graph

text-to-image generation

ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation

Subject-driven Text-to-Image Generation via Apprenticeship Learning

CycleNet: Rethinking Cycle Consistency in Text-Guided Diffusion for Image Manipulation

T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation

Norm-guided latent space exploration for text-to-image generation

Cocktail: Mixing Multi-Modality Control for Text-Conditional Image Generation

DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models

StyleDrop: Text-to-Image Synthesis of Any Style

RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths

Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models

Training-free Diffusion Model Adaptation for Variable-Sized Text-to-Image Synthesis

Conditional Score Guidance for Text-Driven Image-to-Image Translation

BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing

TextDiffuser: Diffusion Models as Text Painters

Controlling Text-to-Image Diffusion by Orthogonal Finetuning

general boosting

Bootstrapping Vision-Language Learning with Decoupled Language Pre-training

LaFTer: Label-Free Tuning of Zero-shot Classifier using Language and Unlabeled Image Collections

Intra-Modal Proxy Learning for Zero-Shot Visual Categorization with CLIP

A Closer Look at the Robustness of Contrastive Language-Image Pre-Training (CLIP)

Cross-modal Active Complementary Learning with Self-refining Correspondence

Test-Time Distribution Normalization for Contrastively Learned Visual-language Models

Three Towers: Flexible Contrastive Learning with Pretrained Image Models

An Inverse Scaling Law for CLIP Training

Geodesic Multi-Modal Mixup for Robust Fine-Tuning

ChatGPT-Powered Hierarchical Comparisons for Image Classification

others

Learning Mask-aware CLIP Representations for Zero-Shot Segmentation

What Makes Good Examples for Visual In-Context Learning?

Towards Consistent Video Editing with Text-to-Image Diffusion Models

Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP

VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks

A Closer Look at the Robustness of Contrastive Language-Image Pre-Training (CLIP)

About

No description, website, or topics provided.

Report repository

Releases

No releases published

Packages

No packages published