Based on PARL, the PPO algorithm of deep reinforcement learning has been reproduced, reaching the same level of indicators as the paper in mujoco benchmarks.
Paper: PPO in Proximal Policy Optimization Algorithms
Please see here to know more about Mujoco games.
- paddle>=2.0.0
- parl>=2.0.2
- gym==0.9.2
- mujoco-py==0.5.7
# To train an agent for Hopper-v1 game
python train.py
# For more customized arguments
# python train.py --help