Reinforcement learning for connect 4

Train agent (self-play):

python Training.py

Evaluate performance

Evaluate the learning progress by letting the agent play against earlier versions of itself. Run

python evaluate_checkpoints.py

TO DO

Try deeper network (adding convolutional layers)

Implemented, but does not work because it seems that the network is overfitting very quickly on one column, then if that column is full, we set the probability to zero, then our clipping and normalizing stuff (which is also just a workaround) does not work anymore since we get log(0) and therefore nan values etc.

Possible solutions: Avoid overfitting (previously it did work because we the agent did not have such high probabilities for any column), change clipping process (I have tried clipping at different values and normalize afterwards again, but so far without success).

evaluate after each epoch of training instead of in the end

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Reinforcement learning for connect 4

Train agent (self-play):

Evaluate performance

TO DO

Files

README.md

Latest commit

History

README.md

File metadata and controls

Reinforcement learning for connect 4

Train agent (self-play):

Evaluate performance

TO DO