Constrained Model-Based Policy Optimization

This repository contains a model-based version of Constrained Policy Optimization (Achiam et al.).

Prerequisites

Mujoco 2.0 has to be installed and a workin license is required.
Instructions on installing mujoco can be found here Mujoco-Py
We use conda environments for installs, please refer to Anaconda for instructions.

Installation

Clone this repository

git clone https://github.com/anyboby/ConstrainedMBPO.git

Create a conda environtment and install cmbpo from egg

cd ConstrainedMBPO
conda env create -f environment/gpu-env.yml
conda activate cmbpo
pip install -e .

Usage

To start an experiment with cmbpo, run

cmbpo run_local examples.development --config=examples.config.antsafe.cmbpo_uc --policy=cpopolicy --gpus=1 --trial-gpus=1

-- config specifies the configuration for an environment (here: antsafe)
-- policy specifies the policy to use, as of writing cmbpo only works on a cpo policy
-- gpus specifies the number of gpus to use

As of writing, only local running is supported. For further options, refer to the ray experiment specifications.

Testing on new environments

Different environments can be tested by creating a config file in the config directory. The algorithm does not learn termination functions and unless otherwise specified will not terminate. To create a manual terminal function, do so by creating a file with the lower-case name of the environment under 'mbpo/static'.

Logging

A wide range of data is logged automatically in tensorboard. The corresponding files and checkpoints are stored in ~/ray_mbpo/. To view the tensorboard logs run

tensorboard --logdir ~/ray\_mbpo/<env>/<seed_dir>

Hyperparameters

The hyperparameter list as of now is a bit messy. The hyperparameters for cmbpo are specified in the config file (e.g., config_file). The main parameters for cmbpo are

'rollout_mode':'uncertainty',
'max_uncertainty_c':4.0,
'max_uncertainty_rew':3.5,
'max_tddyn_err':.07,

The rollout mode specifies when to terminate rollouts. The 'uncertainty' rollout mode terminates rollouts based on how much the uncertainty of model-predictions grows compared to the first prediction. Other options are 'iv', an inverse-variance weighting of rollout lengths by the variance of rewards, and 'schedule', a fixed schedule of rollout-lengths specified by 'rollout_schedule'.

The 'max_uncertainty' only applies to rollout modes 'uncertainty' and 'iv' where this parameter limits the ratio of predictitive uncertainty to the initial predictions (for rew and cost seperately, but in our experiments similar values for both worked well).

'max_tddyn_err' refers to the maximum temporal difference dynamics error and specifies how much we rely on the model. This variable is calculated as mean variance among ensemble predictions, normalized by the base variance of targets. Assuming we specifiy this maximum uncertainty to be 0.05 and we measure an average uncertainty of 0.1 among our ensemble predictions, the algorithm will train on a dataset consisting of half model-generated samples and half real samples (resulting in an average error of .05).

Acknowledgments

This repository also contains several more algorithms including SAC from Tuomas Haarnoja, Kristian Hartikainen's softlearning codebase, Kurtland Chua's dynamics ensemble in PETS, and CPO from Joshua Achiam.

Name		Name	Last commit message	Last commit date
Latest commit History 276 Commits
environment		environment
examples		examples
mbpo		mbpo
softlearning		softlearning
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Constrained Model-Based Policy Optimization

Prerequisites

Installation

Usage

Testing on new environments

Logging

Hyperparameters

Acknowledgments

About

Releases

Packages

Contributors 3

Languages

License

anyboby/ConstrainedMBPO

Folders and files

Latest commit

History

Repository files navigation

Constrained Model-Based Policy Optimization

Prerequisites

Installation

Usage

Testing on new environments

Logging

Hyperparameters

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages