Dependencies

Stable Baselines

Policy Types

https://stable-baselines.readthedocs.io/en/master/modules/policies.html

MLP (Multi-layer perceptron)
- MLPPolicy
  - Basic implementation, 2 layers of 64
- MLPLstmPolicy
  
  LSTMs are explicitly designed to avoid the long-term dependency problem. Remembering information for long periods of time is practically their default behavior, not something they struggle to learn!
  - The problem of Mario shouldn't need long term dependencies
- MLPLnLstmPolicy
  - LSTM but input is normalized
CNN
- cnns are for images only

Customizing Policies

We can customize by setting the parameters of the Policy class
https://stable-baselines.readthedocs.io/en/master/guide/custom_policy.html

Ones we probably care about:

n_env - (int) The number of environments to run
n_steps - (int) The number of steps to run for each environment
n_batch - (int) The number of batch to run (n_envs * n_steps)

PPO2 Parameters

PPO hyper parameters explained: https://medium.com/aureliantactics/ppo-hyperparameters-and-ranges-6fc2d29bccbe

learning_rate
noptepochs - number of epochs

Automatic Hyper-parameter Tuning

There is a project that created some pre-trained agents called rl-zoo. They use a project called Optuna to find the best hyper-parameters for the agents so we might want to use it too.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
gym_mario @ 1a3dde8		gym_mario @ 1a3dde8
.gitmodules		.gitmodules
README.md		README.md
gym_super_mario_bros		gym_super_mario_bros
mario_collab.ipynb		mario_collab.ipynb
run_agent.py		run_agent.py
simple_vec_agent.py		simple_vec_agent.py
vec_agent.py		vec_agent.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dependencies

Stable Baselines

Policy Types

Customizing Policies

PPO2 Parameters

Automatic Hyper-parameter Tuning

About

Releases

Packages

Contributors 2

Languages

logmosier/ai_project

Folders and files

Latest commit

History

Repository files navigation

Dependencies

Stable Baselines

Policy Types

Customizing Policies

PPO2 Parameters

Automatic Hyper-parameter Tuning

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages