Udacity Deep Reinforcement Learning Nanodegree Program
- To run the project just execute the soccer.ipynb file.
- If you are not using a windows environment, you will need to download the corresponding "Soccer" version for you OS system. Mail me if you need more details about the environment .exe file.
- The checkpoint.pth has the expected average score already hit.
- tensorflow: 1.7.1
- Pillow: 4.2.1
- matplotlib
- numpy: 1.11.0
- pytest: 3.2.2
- docopt
- pyyaml
- protobuf: 3.5.2
- grpcio: 1.11.0
- torch: 0.4.1
- pandas
- scipy
- ipykernel
- jupyter: 5.6.0
- The task involves a soccer game with 2 teams, each one having 2 players: 1 striker and one 1 keeper.
- There is no goal defined for default, so I decided to train against a random team until my agents archive a score of 95 wins into 100 games.
- The goalies have 4 actions.
- The strikers have 6 actions.
- The biggest problem in this scenario is to control the exploration vs. exploitation rate. I tried approaches such as Double DQN with an exponential exploration rating decay as well as the DDPG approach with prioritized replay experience for diversification of the experiences on learning, but I couldn't find the right configuration for the hyperparameters that could make the agents converge. So I changed the approach for a PPO strategy since this kind of method is easier to configure and controls the exploration very well by itself using probabilistic decisions. After a lot of different implementations, I've reached the current solution.
- An actor critic neural modelis employed here and is using a proximal policy optimization learning function with the trusted region approach. The learning happens after each episode (controlled by the environment), and it uses mini-batches from the episode experiences after the reward calculation using the N-Step method that combines the temporal difference discount with monte carlo tree search exploration (in this case the N-Step range is the role episode).
- For now, I'll try other variations changing when the learning happens and using multi teams for experience gathering. I hope I can archive superhuman results with 5000 episodes or less (the agents are good but not super humans with 5000 episodes).
- One last consideration: To beat a random team looks easier at the beginning, but if you consider that random agents win 1/3 of the games and the draw rate of random games is 1/3, the AI has overcome a big challenge reaching a 95% win rate. It's incredible how a random agent can score with just a few steps.
The file with the hyperparameters configuration is the soccer.ipynb.
If you want you can change the model configuration to into the model.py file.
The actual configuration of the hyperparameters is:
- Learning Rate Goalie: 8e-5
- Learning Rate Striker: 1e-4
- Gamma: 0.995
- Batch Size: 32
- Epsilon: 0.1
- Entropy Weight: 0.001
For the neural models:
- Hidden: (input, 256) - ReLU
- Hidden: (256, 128) - ReLU
- Output: (128, action_size) - Softmax
- Hidden: (input, 256) - ReLU
- Hidden: (256, 128) - ReLU
- Output: (128, 1) - Linear