This repository provides the Cooperative Adversarial Self-supervised Skill Imitation (CASSI) algorithm that enables Solo to extract diverse skills through adversarial imitation from unlabeled, mixed motions using NVIDIA Isaac Gym.
Paper: Versatile Skill Control via Self-supervised Adversarial Imitation of Unlabeled Mixed Motions
Project website: https://sites.google.com/view/icra2023-cassi/home
Maintainer: Chenhao Li
Affiliation: Autonomous Learning Group, Max Planck Institute for Intelligent Systems, and Robotic Systems Lab, ETH Zurich
Contact: [email protected]
-
Create a new python virtual environment with
python 3.8
-
Install
pytorch 1.10
withcuda-11.3
pip3 install torch==1.10.0+cu113 torchvision==0.11.1+cu113 torchaudio==0.10.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
-
Install Isaac Gym
-
Download and install Isaac Gym Preview 4
cd isaacgym/python pip install -e .
-
Try running an example
cd examples python 1080_balls_of_solitude.py
-
For troubleshooting, check docs in
isaacgym/docs/index.html
-
-
Install
solo_gym
git clone https://github.com/martius-lab/cassi.git cd solo_gym pip install -e .
- The Solo environment is defined by an env file
solo8.py
and a config filesolo8_config.py
undersolo_gym/envs/solo8/
. The config file sets both the environment parameters in classSolo8FlatCfg
and the training parameters in classSolo8FlatCfgPPO
. - The provided code examplifies the training of Solo 8 with unlabeled mixed motions. Demonstrations induced by 6 locomotion gaits are randomly mixed and augmented with perturbations to 6000 trajectoires with 120 frames and stored in
resources/robots/solo8/datasets/motion_data.pt
. The state dimension indices are specified inreference_state_idx_dict.json
. To train with other demonstrations, replacemotion_data.pt
and adapt reward functions defined insolo_gym/envs/solo8/solo8.py
accordingly.
python scripts/train.py --task solo8
- The trained policy is saved in
logs/<experiment_name>/<date_time>_<run_name>/model_<iteration>.pt
, where<experiment_name>
and<run_name>
are defined in the train config. - To disable rendering, append
--headless
.
python scripts/play.py
- By default the loaded policy is the last model of the last run of the experiment folder.
- Other runs/model iteration can be selected by setting
load_run
andcheckpoint
in the train config. - Use
u
andj
to command the forward velocity,h
andk
to switch between the extracted skills.
@inproceedings{li2023versatile,
title={Versatile skill control via self-supervised adversarial imitation of unlabeled mixed motions},
author={Li, Chenhao and Blaes, Sebastian and Kolev, Pavel and Vlastelica, Marin and Frey, Jonas and Martius, Georg},
booktitle={2023 IEEE international conference on robotics and automation (ICRA)},
pages={2944--2950},
year={2023},
organization={IEEE}
}
The code is built upon the open-sourced Isaac Gym Environments for Legged Robots and the PPO implementation. We refer to the original repositories for more details.