You will train an agent in CartPole-v0 (OpenAI Gym) environment via Proximal Policy Optimization (PPO) algorithm with GAE.
A reward of +1 is provided for every step taken, and a reward of 0 is provided at the termination step. The state space has 4 dimensions and contains the cart position, velocity, pole angle and pole velocity at tip. Given this information, the agent has to learn how to select best actions. Two discrete actions are available, corresponding to:
0
- 'Push cart to the left'1
- 'Push cart to the right'
For more details, see the wiki.
For training results and making animation, see train.ipynb.
- Python 3.6
- PyTorch 0.4.0
- OpenAI Gym 0.10.5 (for Installation, see this.)