Research code to accompany the paper: Off-Policy Imitation Learning from Observations.
- OPOLO (the proposed algorithm in the paper).
- Discriminator Actor Critic (DAC): paper and official code.
- Generative Adversarial Imitation Learning (GAIL): paper and official code.
- Behavior Cloning from Observations (BCO): paper and official code.
- Generative Adversarial Imitation from Observation (GAIfO): paper and code (from other repositories ).
- DACfO (proposed as a baseline in the paper).
All code is built on the stable-baseline framework.
- Python(>=3.5), Cmake, and OpenMPI.
- Please install prerequisite by following this guideline.
- Mujoco:
- Please follow this official instruction.
cd opolo
pip install -e .
- Example: run OPOLO on the HalfCheetah-v2 task, using 4 expert trajectories, seed = 1:
cd opolo-code/opolo-baselines/run
python train_agent.py --env HalfCheetah-v2 --seed 1 --algo opolo --task td3-opolo-idm-decay-reg --n-episodes 4 --log-dir your/absolute/log/path --n-timesteps -1
- The
task
tag must contain strings ofidm
,decay
, andreg
:idm
: use inverse-action model.reg
: use forward KL-divergence as regularization.decay
: reduce the effects of the regularization over time.
- Run DAC on the Hopper-v2 task, using 4 expert trajectories, seed = 3:
cd opolo-code/opolo-baselines/run
python train_agent.py --env Hopper-v2 --seed 3 --algo td3dac --log-dir your/absolute/log/path --task td3-dac --n-timesteps -1 --n-episodes 4
- Run DACfO on the Walker2d-v2 task, using 10 expert trajectories, seed = 1:
cd opolo-code/opolo-baselines/run
python train_agent.py --env Walker2d-v2 --seed 1 --algo td3dacfo --log-dir your/absolute/log/path --task td3-dacfo --n-timesteps -1 --n-episodes 10
- Run BCO on the Swimmer-v2 task, using 4 expert trajectories, seed = 1:
cd opolo-code/opolo-baselines/run
python train_agent.py --env Swimmer-v2 --seed 1 --algo td3bco --log-dir your/absolute/log/path --task td3-bco --n-timesteps -1 --n-episodes 4
- Run GAIL on the Swimmer-v2 task, using 4 expert trajectories, seed = 1:
cd opolo-code/opolo-baselines/run
python train_agent.py --env Swimmer-v2 --seed 1 --algo trpogail --log-dir your/absolute/log/path --task trpo-gail --n-timesteps -1 --n-episodes 4
- Run GAIfO on the Swimmer-v2 task, using 4 expert trajectories, seed = 3:
cd opolo-code/opolo-baselines/run
python train_agent.py --env Swimmer-v2 --seed 3 --algo trpogaifo --log-dir your/absolute/log/path --task trpo-gaifo --n-timesteps -1 --n-episodes 4
- Assuming that you have completed the training of OPOLO on HalfCheetah using the above commands,
with
task = td3-opolo-idm-decay-reg
. - Then you can run the following commands to evaluate the model:
cd opolo-code/opolo-baselines/run
python train_agent.py --env HalfCheetah-v2 --seed 1 --algo opolo --log-dir your/absolute/log/path --task eval-td3-opolo-idm-decay-reg --n-timesteps -1 --n-episodes 4
- Commands are same as training, except for the
task
flag, with task =eval-
+{task-used-for-training}
.
- Expert Trajecotries can be found at:
opolo-code/opolo-baselines/expert_logs
- Hyper-parameter settings can be found at:
opolo-code/opolo-baselines/hyperparams/