DRL - PPO - Soccer

Udacity Deep Reinforcement Learning Nanodegree Program

Observations:

To run the project just execute the soccer.ipynb file.
If you are not using a windows environment, you will need to download the corresponding "Soccer" version for you OS system. Mail me if you need more details about the environment .exe file.
The checkpoint.pth has the expected average score already hit.

Requeriments:

tensorflow: 1.7.1
Pillow: 4.2.1
matplotlib
numpy: 1.11.0
pytest: 3.2.2
docopt
pyyaml
protobuf: 3.5.2
grpcio: 1.11.0
torch: 0.4.1
pandas
scipy
ipykernel
jupyter: 5.6.0

The problem:

The task involves a soccer game with 2 teams, each one having 2 players: 1 striker and one 1 keeper.
There is no goal defined for default, so I decided to train against a random team until my agents archive a score of 95 wins into 100 games.
The goalies have 4 actions.
The strikers have 6 actions.

The solution:

The biggest problem in this scenario is to control the exploration vs. exploitation rate. I tried approaches such as Double DQN with an exponential exploration rating decay as well as the DDPG approach with prioritized replay experience for diversification of the experiences on learning, but I couldn't find the right configuration for the hyperparameters that could make the agents converge. So I changed the approach for a PPO strategy since this kind of method is easier to configure and controls the exploration very well by itself using probabilistic decisions. After a lot of different implementations, I've reached the current solution.
An actor critic neural modelis employed here and is using a proximal policy optimization learning function with the trusted region approach. The learning happens after each episode (controlled by the environment), and it uses mini-batches from the episode experiences after the reward calculation using the N-Step method that combines the temporal difference discount with monte carlo tree search exploration (in this case the N-Step range is the role episode).
For now, I'll try other variations changing when the learning happens and using multi teams for experience gathering. I hope I can archive superhuman results with 5000 episodes or less (the agents are good but not super humans with 5000 episodes).
One last consideration: To beat a random team looks easier at the beginning, but if you consider that random agents win 1/3 of the games and the draw rate of random games is 1/3, the AI has overcome a big challenge reaching a 95% win rate. It's incredible how a random agent can score with just a few steps.

The hyperparameters:

The file with the hyperparameters configuration is the soccer.ipynb.
If you want you can change the model configuration to into the model.py file.
The actual configuration of the hyperparameters is:
- Learning Rate Goalie: 8e-5
- Learning Rate Striker: 1e-4
- Gamma: 0.995
- Batch Size: 32
- Epsilon: 0.1
- Entropy Weight: 0.001
For the neural models:
- Actor
  - Hidden: (input, 256) - ReLU
  - Hidden: (256, 128) - ReLU
  - Output: (128, action_size) - Softmax
- Critic
  - Hidden: (input, 256) - ReLU
  - Hidden: (256, 128) - ReLU
  - Output: (128, 1) - Linear

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
Soccer.ipynb		Soccer.ipynb
checkpoint_goalie_actor.1.pth		checkpoint_goalie_actor.1.pth
checkpoint_goalie_actor.pth		checkpoint_goalie_actor.pth
checkpoint_goalie_critic.1.pth		checkpoint_goalie_critic.1.pth
checkpoint_goalie_critic.pth		checkpoint_goalie_critic.pth
checkpoint_striker_actor.1.pth		checkpoint_striker_actor.1.pth
checkpoint_striker_actor.pth		checkpoint_striker_actor.pth
checkpoint_striker_critic.1.pth		checkpoint_striker_critic.1.pth
checkpoint_striker_critic.pth		checkpoint_striker_critic.pth
memory.py		memory.py
optimizer.py		optimizer.py
soccer_agent.py		soccer_agent.py
soccer_model.py		soccer_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DRL - PPO - Soccer

Observations:

Requeriments:

The problem:

The solution:

The hyperparameters:

About

Releases

Packages

Languages

ShivanshuPurohit/Soccer-using-colaboraton-competition

Folders and files

Latest commit

History

Repository files navigation

DRL - PPO - Soccer

Observations:

Requeriments:

The problem:

The solution:

The hyperparameters:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages