Reinforcement learning for connect 4

Train agent (self-play):

python Training.py

Evaluate performance

Evaluate the learning progress by letting the agent play against earlier versions of itself. Run

python evaluate_checkpoints.py

TO DO

Try deeper network (adding convolutional layers)

Implemented, but does not work because it seems that the network is overfitting very quickly on one column, then if that column is full, we set the probability to zero, then our clipping and normalizing stuff (which is also just a workaround) does not work anymore since we get log(0) and therefore nan values etc.

Possible solutions: Avoid overfitting (previously it did work because we the agent did not have such high probabilities for any column), change clipping process (I have tried clipping at different values and normalize afterwards again, but so far without success).

evaluate after each epoch of training instead of in the end

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
four_wins_env		four_wins_env
.gitignore		.gitignore
07_iannwtf_policy-gradients.ipynb		07_iannwtf_policy-gradients.ipynb
FourWins.py		FourWins.py
README.md		README.md
TaxiDriver.py		TaxiDriver.py
Training.py		Training.py
evaluate_checkpoints.py		evaluate_checkpoints.py
requirements.txt		requirements.txt
todo.txt		todo.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reinforcement learning for connect 4

Train agent (self-play):

Evaluate performance

TO DO

About

Releases

Packages

Contributors 3

Languages

BananaNosh/OpenAI

Folders and files

Latest commit

History

Repository files navigation

Reinforcement learning for connect 4

Train agent (self-play):

Evaluate performance

TO DO

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages