Show Lab

All

76 repositories

Awesome-Robotics-Diffusion
Public
A curated list of recent robot learning papers incorporating diffusion models for robotics tasks.
1•12•0•0•Updated Jan 7, 2025Jan 7, 2025
Awesome-GUI-Agent
Public
💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.
awesome graphical-user-interface ai-assistant llm-agent gui-agents
25•397•0•0•Updated Jan 6, 2025Jan 6, 2025
FQGAN
Public
FQGAN: Factorized Visual Tokenization and Generation
Python
•
Other
•0•38•0•0•Updated Jan 5, 2025Jan 5, 2025
Awesome-Video-Diffusion
Public
A curated list of recent diffusion models for video generation, editing, restoration, understanding, etc.
awesome video-editing video-understanding video-generation diffusion-models text-to-video video-restoration text-to-motion
212•3.7k•2•1•Updated Jan 5, 2025Jan 5, 2025
Tune-An-Ellipse
Public
[CVPR 2024] Tune-An-Ellipse: CLIP Has Potential to Find What You Want
Python
•1•9•2•0•Updated Jan 5, 2025Jan 5, 2025
ShowUI
Public
Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.
agent vision-language-model vision-language-action computer-use gui-agent
Jupyter Notebook
•
Apache License 2.0
•42•794•0•0•Updated Jan 4, 2025Jan 4, 2025
computer_use_ootb
Public
Out-of-the-box (OOTB) GUI Agent for Windows and macOS
Python
•
Apache License 2.0
•98•1.1k•27•4•Updated Jan 1, 2025Jan 1, 2025
VideoLISA
Public
[NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
Python
•
Apache License 2.0
•2•94•8•0•Updated Dec 26, 2024Dec 26, 2024
Show-o
Public
Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
multimodal diffusion-models large-language-models
Python
•
Apache License 2.0
•47•1.1k•34•1•Updated Dec 26, 2024Dec 26, 2024
MovieBench
Public
Python
•1•37•0•0•Updated Dec 24, 2024Dec 24, 2024
Awesome-Unified-Multimodal-Models
Public
📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.
13•309•0•1•Updated Dec 23, 2024Dec 23, 2024
Awesome-MLLM-Hallucination
Public
📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).
20•542•1•0•Updated Dec 23, 2024Dec 23, 2024
DiffSim
Public
Official repository of DiffSim: Taming Diffusion Models for Evaluating Visual Similarity
Python
•0•8•0•0•Updated Dec 20, 2024Dec 20, 2024
IDProtector
Public
The code implementation of **IDProtector: An Adversarial Noise Encoder to Protect Against ID-Preserving Image Generation**.
0•4•0•0•Updated Dec 16, 2024Dec 16, 2024
ROICtrl
Public
Code for ROICtrl: Boosting Instance Control for Visual Generation
Python
•0•99•0•0•Updated Dec 10, 2024Dec 10, 2024
videogui
Public
[NeurIPS2024] VideoGUI: A Benchmark for GUI Automation from Instructional Videos
gui video-language llm-agent
JavaScript
•1•24•0•0•Updated Dec 10, 2024Dec 10, 2024
VideoSwap
Public
Code for [CVPR 2024] VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence
Python
•14•366•2•0•Updated Dec 6, 2024Dec 6, 2024
Show-1
Public
[IJCV] Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
Python
•
Other
•61•1.1k•8•7•Updated Nov 15, 2024Nov 15, 2024
BoxDiff
Public
[ICCV 2023] BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion
text-to-image-synthesis diffusion-models
Python
•18•254•7•0•Updated Nov 12, 2024Nov 12, 2024
sparseformer
Public
(ICLR 2024, CVPR 2024) SparseFormer
computer-vision transformer efficient-neural-networks vision-transformer sparseformer
Python
•
MIT License
•2•66•1•0•Updated Nov 10, 2024Nov 10, 2024
LOVA3
Public
(NeurIPS 2024) Learning to Visual Question Answering, Asking and Assessment
benchmark visual-question-answering multimodal-deep-learning visual-question-generation multimodal-large-language-models data-asse
Python
•2•79•0•0•Updated Nov 7, 2024Nov 7, 2024
VisInContext
Public
Official implementation of Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning
efficient in-context-learning llm mllm
Python
•2•14•1•0•Updated Oct 30, 2024Oct 30, 2024
Exo2Ego-V
Public
0•10•1•0•Updated Oct 29, 2024Oct 29, 2024
watermark-steganalysis
Public
Python
•0•3•0•0•Updated Oct 24, 2024Oct 24, 2024
EvolveDirector
Public
[NeurIPS 2024] EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models.
Python
•0•44•0•0•Updated Oct 14, 2024Oct 14, 2024
MovieSeq
Public
[ECCV2024] Learning Video Context as Interleaved Multimodal Sequences
Jupyter Notebook
•1•32•1•0•Updated Oct 1, 2024Oct 1, 2024
GUI-Narrator
Public
Repository of GUI Action Narrator
JavaScript
•0•5•0•0•Updated Sep 22, 2024Sep 22, 2024
RingID
Public
Python
•0•22•1•0•Updated Aug 30, 2024Aug 30, 2024
MotionDirector
Public
[ECCV 2024 Oral] MotionDirector: Motion Customization of Text-to-Video Diffusion Models.
video-generation diffusion-models text-to-video text-to-motion text-to-video-generation motion-customization
Python
•
Apache License 2.0
•54•861•23•0•Updated Aug 21, 2024Aug 21, 2024
videollm-online
Public
VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)
Python
•
Apache License 2.0
•32•278•21•0•Updated Aug 15, 2024Aug 15, 2024