This repository provides the Wasserstein Adversarial Behavior Imitation (WASABI) algorithm that enables Solo to acquire agile skills through adversarial imitation from rough, partial demonstrations using NVIDIA Isaac Gym.
Paper: Learning Agile Skills via Adversarial Imitation of Rough Partial Demonstrations
Project website: https://sites.google.com/view/corl2022-wasabi/home
Maintainer: Chenhao Li
Affiliation: Autonomous Learning Group, Max Planck Institute for Intelligent Systems, and Robotic Systems Lab, ETH Zurich
Contact: [email protected]
-
Create a new python virtual environment with
python 3.8
-
Install
pytorch 1.10
withcuda-11.3
pip3 install torch==1.10.0+cu113 torchvision==0.11.1+cu113 torchaudio==0.10.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
-
Install Isaac Gym
-
Download and install Isaac Gym Preview 4
cd isaacgym/python pip install -e .
-
Try running an example
cd examples python 1080_balls_of_solitude.py
-
For troubleshooting, check docs in
isaacgym/docs/index.html
-
-
Install
solo_gym
git clone https://github.com/martius-lab/wasabi.git cd solo_gym pip install -e .
- The Solo environment is defined by an env file
solo8.py
and a config filesolo8_config.py
undersolo_gym/envs/solo8/
. The config file sets both the environment parameters in classSolo8FlatCfg
and the training parameters in classSolo8FlatCfgPPO
. - The provided code examplifies the training of Solo 8 with handheld wave motions. 20 recorded demonstrations are augmented with perturbations to 1000 trajectoires with 130 frames and stored in
resources/robots/solo8/datasets/motion_data.pt
. The state dimension indices are specified inreference_state_idx_dict.json
. To train with other demonstrations, replacemotion_data.pt
and adapt reward functions defined insolo_gym/envs/solo8/solo8.py
accordingly.
python scripts/train.py --task solo8
- The trained policy is saved in
logs/<experiment_name>/<date_time>_<run_name>/model_<iteration>.pt
, where<experiment_name>
and<run_name>
are defined in the train config. - To disable rendering, append
--headless
.
python scripts/play.py
- By default the loaded policy is the last model of the last run of the experiment folder.
- Other runs/model iteration can be selected by setting
load_run
andcheckpoint
in the train config. - Use
u
andj
to command the forward velocity.
@inproceedings{li2023learning,
title={Learning agile skills via adversarial imitation of rough partial demonstrations},
author={Li, Chenhao and Vlastelica, Marin and Blaes, Sebastian and Frey, Jonas and Grimminger, Felix and Martius, Georg},
booktitle={Conference on Robot Learning},
pages={342--352},
year={2023},
organization={PMLR}
}
The code is built upon the open-sourced Isaac Gym Environments for Legged Robots and the PPO implementation. We refer to the original repositories for more details.