A curated list of recent robot learning papers incorporating diffusion models for manipulation, navigation, planning etc.
- Benchmarks
- Diffusion Policy
- Diffusion Generation Models in Robot Learning
- Robot Learning Utilizing Diffusion Model Properties
-
Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations (RSS 2018)
-
Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning (CoRL 2020)
-
Bridge data: Boosting generalization of robotic skills with cross-domain datasets (RSS 2022)
-
DexMV: Imitation Learning for Dexterous Manipulation from Human Videos (ECCV 2022)
-
Towards Human-Level Bimanual Dexterous Manipulation with Reinforcement Learning (NeurIPS 2022 Datasets and Benchmarks Track)
-
Dexart: Benchmarking generalizable dexterous manipulation with articulated objects (CVPR 2023)
-
BridgeData V2: A Dataset for Robot Learning at Scale (CoRL 2023)
-
CALVIN: A Benchmark for Language-conditioned Policy Learning for Long-horizon Robot Manipulation Tasks (RAL 2022)
-
RLBench: The Robot Learning Benchmark & Learning Environment (RAL 2020)
-
LIBERO: Benchmarking Knowledge Transfer in Lifelong Robot Learning (NeurIPS 2023 Dataset and Benchmark Track)
Visual Pusher
Panda Arm
Dexdeform: Dexterous deformable object manipulation with human demonstrations and differentiable physics (To be checked)
-
Imitating Human Behaviour with Diffusion Models (ICLR 2023)
-
Se(3)-diffusionfields: Learning cost functions for joint grasp and motion optimization through diffusion (ICRA 2023)
-
Diffusion policy: Visuomotor policy learning via action diffusion (RSS 2023)
-
Goal-conditioned imitation learning using score-based diffusion policies (RSS 2023)
-
Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition (CoRL 2023)
-
ChainedDiffuser: Unifying Trajectory Diffusion and Keypose Prediction for Robotic Manipulation (CoRL 2023)
-
Generative Skill Chaining: Long-Horizon Skill Planning with Diffusion Models (CoRL 2023)
-
PlayFusion: Skill Acquisition via Diffusion from Language-Annotated Play (CoRL 2023)
-
Memory-Consistent Neural Networks for Imitation Learning (ICLR 2024)
-
EDMP: Ensemble-of-costs-guided Diffusion for Motion Planning (ICRA 2024)
-
Diffskill: Improving Reinforcement Learning Through Diffusion-Based Skill Denoiser for Robotic Manipulation (Knowledge-Based Systems 2024)
-
Consistency policy: Accelerated visuomotor policies via consistency distillation (RSS 2024)
-
Track2Act: Predicting Point Tracks from Internet Videos enables Generalizable Robot Manipulation (ECCV 2024)
-
Differentiable Robot Rendering (CoRL 2024)
-
Sparse Diffusion Policy: A Sparse, Reusable, and Flexible Policy for Robot Learning (CoRL 2024)
-
Equivariant Diffusion Policy (CoRL 2024)
-
GenDP: 3D Semantic Fields for Category-Level Generalizable Diffusion Policy (CoRL 2024)
-
EquiBot: SIM (3)-Equivariant Diffusion Policy for Generalizable and Data Efficient Learning (CoRL 2024)
-
3D Diffuser Actor: Policy Diffusion with 3D Scene Representations (CoRL 2024)
-
3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations (RSS 2024)
-
Vision-Language-Affordance-based Robot Manipulation with Flow Matching (Sep 2024)
-
Planning-Guided Diffusion Policy Learning for Generalizable Contact-Rich Bimanual Manipulation (Sep 2024)
-
Diffusion-VLA: Scaling Robot Foundation Models via Unified Diffusion and Autoregression (Dec 2024)
-
XSkill: Cross Embodiment Skill Discovery (CoRL 2023)
-
SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution (CVPR 2024)
-
Hierarchical Diffusion Policy for Kinematics-Aware Multi-Task Robotic Manipulation (CVPR 2024)
-
Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning (Dec 2024)
Is Conditional Generative Modeling all you need for Decision-Making? (Decision Diffuser)
-
Waypoint-Based Imitation Learning for Robotic Manipulation (CoRL 2023)
-
RoLD: Robot Latent Diffusion for Multi-task Policy Modeling ()
-
UMI on Legs: Making Manipulation Policies Mobile with Manipulation-Centric Whole-body Controllers (CoRL 2024)
-
Adaptive Online Replanning with Diffusion Models (NeurIPS 2023)
Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level Stability and High-Level Behavior (Highly Theoretical)
Flow Matching Imitation Learning for Multi-Support Manipulation (Flow matching)
Cold Diffusion on the Replay Buffer: Learning to Plan from Known Good States
DiMSam: Diffusion Models as Samplers for Task and Motion Planning under Partial Observability (With constraint)
Scaling Diffusion Policy in Transformer to 1 Billion Parameters for Robotic Manipulation
ViViDex: Learning Vision-based Dexterous Manipulation from Human Videos
3D-ViTac: Learning Fine-Grained Manipulation with Visuo-Tactile Sensing (Include tactile)
SplatSim: Zero-Shot Sim2Real Transfer of RGB Manipulation Policies Using Gaussian Splatting (Sim2Real)
Movement Primitive Diffusion: Learning Gentle Robotic Manipulation of Deformable Objects (Deformable)
SculptDiff: Learning Robotic Clay Sculpting from Humans with Goal Conditioned Diffusion Policy (3D deformable objects)
Pushing the Limits of Cross-Embodiment Learning for Manipulation and Navigation (Both manipulation and navigation)
Diffusion Co-Policy for Synergistic Human-Robot Collaborative Tasks
PianoMime: Learning a Generalist, Dexterous Piano Player from Internet Demonstrations
GenDP: 3D Semantic Fields for Category-Level Generalizable Diffusion Policy (single task generalizability)
Adaptive Compliance Policy: Learning Approximate Compliance for Diffusion Guided Control (considering compliance / forces during manipulation)
ForceMimic: Force-Centric Imitation Learning with Force-Motion Capture System for Contact-Rich Manipulation (forces centric)
Learning Diffusion Policies from Demonstrations For Compliant Contact-rich Manipulation (considering compliance / forces)
CAGE: Causal Attention Enables Data-Efficient Generalizable Robotic Manipulation (single task generalization)
Language-Guided Object-Centric Diffusion Policy for Collision-Aware Robotic Manipulation (3D, object centric, single task generalization)
Affordance-Centric Policy Learning: Sample Efficient and Generalisable Robot Policy Learning using Affordance-Centric Task Frames (object centric)
MaIL: Improving Imitation Learning with Selective State Space Models (using Mamba)
Crossway Diffusion: Improving Diffusion-based Visuomotor Policy via Self-supervised Learning
The Ingredients for Robotic Diffusion Transformers
Discrete Policy: Learning Disentangled Action Space for Multi-Task Robotic Manipulation
C3DM: Constrained-Context Conditional Diffusion Models for Imitation Learning (tackle spurious correlation)
DexGrasp-Diffusion: Diffusion-based Unified Functional Grasp Synthesis Method for Multi-Dexterous Robotic Hands
Sampling Constrained Trajectories Using Composable Diffusion Models (Trajectory optimization with constraints present)
SPOT: SE(3) Pose Trajectory Diffusion for Object-Centric Manipulation (Tracking object pose)
Unpacking Failure Modes of Generative Policies: Runtime Monitoring of Consistency and Progress (out-of-distribution scenarios, detect failures)
- Composable Part-Based Manipulation (CoRL 2023)
FREE FROM BELLMAN COMPLETENESS: TRAJECTORY STITCHING VIA MODEL-BASED RETURN-CONDITIONED SUPERVISED LEARNING
One-Shot Imitation under Mismatched Execution
LANGUAGE CONTROL DIFFUSION: EFFICIENTLY SCALING THROUGH SPACE, TIME, AND TASKS
Diffusion-based learning of contact plans for agile locomotion
DIDI: Diffusion-Guided Diversity for Offline Behavioral Generation (To be checked)
Preference Aligned Diffusion Planner for Quadrupedal Locomotion Control (out-of-distribution issue)
-
Shelving, stacking, hanging: Relational pose diffusion for multi-modal rearrangement (CoRL 2023)
-
Learning score-based grasping primitive for human-assisting dexterous grasping (NeurIPS 2023)
-
Reorientdiff: Diffusion model based reorientation for object manipulation (ICRA 2024)
-
DexDiffuser: Generating Dexterous Grasps with Diffusion Models (Feb 2024)
-
Dexterous Functional Pre-Grasp Manipulation with Diffusion Policy (Mar 2024)
Learning Visuotactile Skills with Two Multifingered Hands
NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration
DiPPeR: Diffusion-based 2D Path Planner applied on Legged Robots
SafeDiffuser: Safe Planning with Diffusion Probabilistic Models
LTLDoG: Satisfying Temporally-Extended Symbolic Constraints for Safe Diffusion-based Planning
LDP: A Local Diffusion Planner for Efficient Robot Navigation and Collision Avoidance
DTG : Diffusion-based Trajectory Generation for Mapless Global Navigation
DiffusionSeeder: Seeding Motion Optimization with Diffusion for Rapid Motion Planning (motion planning)
Potential Based Diffusion Motion Planning
DroneDiffusion: Robust Quadrotor Dynamics Learning with Diffusion Models (Drones)
-
DALL-E-Bot: DALL-E-Bot: Introducing Web-Scale Diffusion Models to Robotics (RA-Letters 2023)
-
UniPi: Learning Universal Policies via Text-Guided Video Generation (NeurIPS 2023)
-
AVDC: Learning to Act from Actionless Videos through Dense Correspondences (ICLR 2024)
-
UniSim: UniSim: Learning Interactive Real-World Simulators (ICLR 2024)
-
HiP: Compositional Foundation Models for Hierarchical Planning (NeurIPS 2023)
-
DMD: Diffusion Meets DAgger: Supercharging Eye-in-hand Imitation Learning (Feb 2024)
-
VLP: Video language planning (ICLR 2024)
-
Dreamitate: Dreamitate: Real-World Visuomotor Policy Learning via Video Generation (CoRL 2024)
-
ARDuP: ARDuP: Active Region Video Diffusion for Universal Policies (Jun 2024)
-
This&That: This&That: Language-Gesture Controlled Video Generation for Robot Planning (July 2024)
-
RoboDreamer: RoboDreamer: Learning Compositional World Models for Robot Imagination (ICML 2024)
-
CLOVER: Closed-Loop Visuomotor Control with Generative Expectation for Robotic Manipulation (NeurIPS 2024)
-
SOAR: Autonomous Improvement of Instruction Following Skills via Foundation Models (CoRL 2024)
-
Cacti: Cacti: A framework for scalable multi-task multi-scene visual imitation learning (CoRL 2022 Workshop PRL)
-
GenAug: GenAug: Retargeting behaviors to unseen situations via Generative Augmentation
GR-MG: Leveraging Partially-Annotated Data via Multi-Modal Goal-Conditioned Policy (generate goal image)
Generative Image as Action Models
Scaling Robot Learning with Semantically Imagined Experience
Large-Scale Actionless Video Pre-Training via Discrete Diffusion for Efficient Policy Learning
Diffusion model is an effective planner and data synthesizer for multi-task reinforcement learning
3D Vision-Language-Action Generative World Model
IRASim: Learning Interactive Real-Robot Action Simulators arXiv 2024.6
Structured World Models from Human Videos RSS 2023
HARP: Autoregressive Latent Video Prediction with High-Fidelity Image Generator ICIP 2022
DayDreamer: World Models for Physical Robot Learning CoRL 2022
MimicGen: A Data Generation System for Scalable Robot Learning using Human Demonstrations
PoCo: Policy Composition from and for Heterogeneous Robot Learning (To be checked)
Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion (To be checked)
Pre-trained Text-to-Image Diffusion Models Are Versatile Representation Learners for Control
Imitation Learning from Purified Demonstrations (using forward & reverse diffusion process to purify imperfect demonstrations)