As a means to understanding Q-Learning, a game of noughts and crosses / tic-tac-toe
Includes varied player policies that are hardcoded, and a q-learning policy that can be trained.
It's very much a toolkit, an exploration, not a finished gem
The way to use it is via irb (ruby). Something like:
$ irb -I lib -r series
> ql = Policy::QLearning.new # set up our untrained q-learning 'policy'
> Game.new(ql, Policy::WinRandom, trace: true).play # to see one game (q-learning versus good-but-a-bit-random)
> Series.new(ql, Policy::WinRandom, 500).play # to run a training series
Indicators of training success:
ql.qsa.inspect shows decreasing adjustments => convergence
Stats at end of series indicate high number of draws or
wins for QL player
ql.qsa.qsa[0] shows a preference (higher Q) for opening with a
corner play (moves 0, 2, 6 and 8). Can also check this with
puts ql.play(Board.new, 1).to_s
ql.qsa.inspect shows nearing 4520 different states and 16165 distinct
q-values