MPO

PyTorch Implementation of the Maximum a Posteriori Policy Optimisation (paper1, paper2) Reinforcement Learning Algorithms for OpenAI gym environments.

How to Run

I tested on the below environment.

Ubuntu 18.04
Python 3.7
PyTorch 1.6

INSTALL

Install PyTorch https://pytorch.org/

pip install gym Box2D IPython tqdm scipy tensorboard tensorboardx

Continuous Action Space

python train.py \
  --device cuda:0 \
  --env LunarLanderContinuous-v2 \
  --dual_constraint 0.1 \
  --kl_mean_constraint 0.01 \
  --kl_var_constraint 0.0001 \
  --discount_factor 0.99 \
  --iteration_num 500 \
  --sample_episode_num 100 \
  --sample_episode_maxlen 500 \
  --sample_action_num 64 \
  --batch_size 256 \
  --episode_rerun_num 3 \
  --log log_continuous \
  --render

Discrete Action Space

python train.py \
  --device cuda:0 \
  --env LunarLander-v2 \
  --dual_constraint 0.1 \
  --kl_constraint 0.01 \
  --discount_factor 0.99 \
  --iteration_num 500 \
  --sample_episode_num 100 \
  --sample_episode_maxlen 500 \
  --batch_size 256 \
  --episode_rerun_num 3 \
  --log log_discrete \
  --render

License

This repository is a clone of theogruner/rl_pro_telu, which is licensed under the GNU GPL3 License - see the LICENSE file for details

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

MPO

How to Run

INSTALL

Continuous Action Space

Discrete Action Space

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

MPO

How to Run

INSTALL

Continuous Action Space

Discrete Action Space

License