Tic-tac-toe with Q-Learning

Tic-tac-toe with reinforcement learning, made at Hack McWiCS 2024.

Implementation

The tic-tac-toe bot is implemented using a Q-learning reinforcement model. The model is then trained against randomly selected moves.

Results

After 600,000 episodes of training, we obtain the following results:

100% win rate when playing first against random moves
92% win rate and 8% draw rate when playing second against random moves
100% draw rate when playing against itself

What's next

a more polished UI (web app)
PVP mode
4x4, 5x5 tic-tac-toe
games with higher state-space complexity (connect four) with deep Q-learning