PPO

PyTorch implementation of Proximal Policy Optimization

Usage

Example command line usage:

python main.py BreakoutNoFrameskip-v0 --num-workers 8 --render

This will run PPO with 8 parallel training environments, which will be rendered on the screen. Run with -h for usage information.

Performance

Results are comparable to those of the original PPO paper. The horizontal axis here is labeled by environment steps, whereas the graphs in the paper label it with frames, with 4 frames per step.

Training episode reward versus environment steps for BreakoutNoFrameskip-v3:

References

Proximal Policy Optimization Algorithms

OpenAI Baselines

This code uses some environment utilities such as SubprocVecEnv and VecFrameStack from OpenAI's Baselines.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
assets		assets
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
envs.py		envs.py
main.py		main.py
models.py		models.py
ppo.py		ppo.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PPO

Usage

Performance

References

About

Releases

Packages

Languages

License

lnpalmer/PPO

Folders and files

Latest commit

History

Repository files navigation

PPO

Usage

Performance

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages