Training Implementation

Winter Run is a browser-based game that you can play live here! A deep reinforcement learning agent written in TensorFlow.js can learn to beat the game by training with Proximal Policy Optimization (PPO). After training for roughly 5,000 episodes over the span of ten hours, the agent finally beat the game. After training for about 7,000 more episodes, the agent beat the game in 85/100 evaluation trials.

Because the actor/critic network parameters are stored directly within the browser's local storage, anyone with internet access can pop open a browser and immediately begin training their own PPO agent to beat the game. To my knowledge, an in-browser PPO agent has never been created before.

Training Implementation

For full training implementation details, the reader is directed to the following files within the source code:

/src/app/game/training/agent.ts
/src/app/game/training/buffer.ts
/src/app/game/training/network.ts
/src/app/game/scenes/game-scene.ts (from the update() function and below)

Both the actor and critic networks are multilayer perceptrons that have two hidden layers with 256 units each. The input to each of the networks is an 81-dimensional vector with hand-engineered features extracted from the current game-state. The actor network outputs a softmax probability distribution over the 6 possible actions the agent can take. Thus, the shape of the actor network is [81, 256, 256, 6]. Of course, since the critic network merely outputs the perceived value of the input state, the shape of the critic network is [81, 256, 256, 1]. This results in 175,367 total trainable parameters.

In previous instances of deep reinforcement learning applied to gameplay (such as in the original DQN paper), it has been common to make use of a "frame-skipping technique." Rather than having the agent choose an action on every single frame, the agent instead chooses an action every k frames and applies that same action to the next (k - 1) frames until the next choice must be made. This allows training to run more quickly, since feeding input vectors through the actor network often enough (say, 60 times per second) becomes quite a computationally expensive process. In the PPO implementation presented here, k = 4.

After each episode, a check is made to see if the agent has taken at least 8,192 steps (where a step consists of k frames). If so, all the data for each of the current {state, action, reward} steps are used to train the actor on the PPO surrogate objective with gradient ascent and to train the critic on the mean-squared value error with gradient descent. More specifically, a training step consists of 10 epochs, during each of which 32 shuffled mini-batches are formed from the current data. A single gradient step with a learning rate of 3E-4 is performed on the actor and critic for each mini-batch. The current {state, action, reward} step data are then discarded, now that the policy has changed.

Local Installation

In order to run the project locally, you will need to make sure you have Node.js and Angular CLI installed. To check if you have Node, open a command prompt and type node -v and hit enter. If the prompt doesn't tell you the version of Node that you have, you don't have it installed. Here is a link to download it: I highly recommend installing the LTS (long-term support) version. Once the installer has finished, try node -v again and you should see your version of Node show up.

From there, using the command prompt, navigate into the directory where you would like to install the Github repository. Then, perform the following five commands:

git clone https://github.com/hmomin/ppo-winter-run
cd ppo-winter-run
npm install -g @angular/cli
npm install
npm start

The first command clones the repository into your desired location - you can also download it manually off of Github if you don't have Git installed on your machine. The next commands move into the newly created ppo-winter-run directory, install Angular CLI globally, install all Node modules and dependencies, and then open the compiled Angular project in your browser.

References

License

All files in the repository are under the MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
src		src
utils		utils
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
angular.json		angular.json
package-lock.json		package-lock.json
package.json		package.json
tsconfig.app.json		tsconfig.app.json
tsconfig.json		tsconfig.json
tslint.json		tslint.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Training Implementation

Local Installation

References

License

About

Releases

Packages

Languages

License

hmomin/PPO-Winter-Run

Folders and files

Latest commit

History

Repository files navigation

Training Implementation

Local Installation

References

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages