README

As a means to understanding Q-Learning, a game of noughts and crosses / tic-tac-toe

Includes varied player policies that are hardcoded, and a q-learning policy that can be trained.

It's very much a toolkit, an exploration, not a finished gem

USE

The way to use it is via irb (ruby). Something like:

$ irb -I lib -r series
> ql = Policy::QLearning.new # set up our untrained q-learning 'policy'
> Game.new(ql, Policy::WinRandom, trace: true).play  # to see one game (q-learning versus good-but-a-bit-random)
> Series.new(ql, Policy::WinRandom, 500).play  # to run a training series

MEASURING SUCCESS

Indicators of training success:

ql.qsa.inspect shows decreasing adjustments => convergence

Stats at end of series indicate high number of draws or
wins for QL player

ql.qsa.qsa[0] shows a preference (higher Q) for opening with a
corner play (moves 0, 2, 6 and 8). Can also check this with
puts ql.play(Board.new, 1).to_s

ql.qsa.inspect shows nearing 4520 different states and 16165 distinct
q-values

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
.vscode		.vscode
spec		spec
src		src
.editorconfig		.editorconfig
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
README_CRYSTAL.md		README_CRYSTAL.md
shard.yml		shard.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

README

USE

MEASURING SUCCESS

About

Releases

Packages

Languages

License

baob/sandbox-q-learning-crystal

Folders and files

Latest commit

History

Repository files navigation

README

USE

MEASURING SUCCESS

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages