This package contains the source-code for all experiments performed for our thesis [1] and paper [2]. They consist of: (i) a hyperparameter search script, (ii) a training script, (iii) a tournament runner script and a (iv) similarities script. Th steps to reproduce each experiment are described as follows.
First of all, install gym-locm
's legacy experiments dependencies:
pip install -e .['legacy-experiments']
To perform a hyperparameter tuning, simply execute the hyp-search.py script:
python3 gym_locm/experiments/hyp-search.py --approach <approach> --battle-agent <battle_agent> \
--path hyp_search_results/ --seed 96765 --processes 4
The list and range of hyperparameters explored is available in the Appendix of our paper and in Attachment A of
our thesis. we performed hyperparameter tunings for all combinations of <approach>
(immediate
, history
and lstm
) and <battle_agent>
(max-attack
and greedy
). Each run of the script took around 2 days with the
max-attack
battle agent and more than a week with the greedy
battle agent. To learn about other script's
parameters, execute it with the --help
flag.
To train two draft agents (a 1st player and a 2nd player) with a specific draft approach and battle agent, in asymmetric self-play, simply execute the training.py script:
python3 gym_locm/experiments/training.py --approach <approach> --battle-agent <battle_agent> \
--path training_results/ --switch-freq <switch_freq> --layers <layers> --neurons <neurons> \
--act-fun <activation_function> --n-steps <batch_size> --nminibatches <n_minibatches> \
--noptepochs <n_epochs> --cliprange <cliprange> --vf-coef <vf_coef> --ent-coef <ent_coef> \
--learning-rate <learning_rate> --seed 32359627 --concurrency 4
We trained 20 draft agents (ten 1st players and 2nd second players) of each combination of <approach>
and
<battle_agent>
, using the best sets of hyperparameters found for them in the previous experiment. That comprises
ten runs of the script, in which we used the seeds 32359627, 91615349, 88803987, 83140551, 50731732, 19279988, 35717793,
48046766, 86798618 and 62644993.
To learn about other script's parameters, execute it with the --help
flag. Running the script with all default
parameters will train a immediate
drafter with the max-attack
battler, using the best set of hyperparameters
we found for that combination. Each run of the script took around 50 minutes with the max-attack
battle agent and
around three hours with the greedy
battle agent.
To run the tournament, simply execute the tournament.py script:
python3 gym_locm/experiments/tournament.py \
--drafters random max-attack coac closet-ai icebox \
gym_locm/trained_models/<battle_agent>/immediate/ \
gym_locm/trained_models/<battle_agent>/history/ \
gym_locm/trained_models/<battle_agent>/lstm/ \
--battler <battle_agent> --concurrency 4 --games 1000 --path tournament_results/ \
--seeds 32359627 91615349 88803987 83140551 50731732 19279988 35717793 48046766 86798618 62644993
replacing <battle_agent>
for either max-attack
or greedy
, respectively, to run either tournament as
depicted in the thesis. The tournament results include matches of all draft agents versus the max-attack
draft agent, as depicted in the paper. The script will create files at tournament_results/
describing
the individual win rates of every set of matches, the aggregate win rates, average mana curves and every
individual draft choice made by every agent, in CSV format, for human inspection, and as serialized Pandas
data frames (PKL format), for easy further data manipulation. To learn about other script's
parameters, execute it with the --help
flag.
To reproduce the plot containing the agent's three-dimensional coordinates found via Principal Component Analysis and grouped via K-Means, simply execute the similarities.py script:
python3 gym_locm/experiments/similarities.py \
--files max_attack_tournament_results/choices.csv greedy_tournament_results/choices.csv
which will result in a PNG image saved to the current folder.
We used the
source code of the Strategy Card Game AI competition
to re-run the matches, replacing the max-attack player (named Baseline2) with a personalized player featuring
our best draft agent and the battle portion on the max-attack player. This can be reproduced by altering line
11 of the runner script
(run.sh)
from AGENTS[10]="python3 Baseline2/main.py"
to
AGENTS[10]="python3 gym_locm/toolbox/predictor.py --battle \"python3 Baseline2/main.py\" \
--draft-1 path/to/gym_locm/trained_models/max-attack/immediate/1st/6.json \
--draft-2 path/to/gym_locm/trained_models/max-attack/immediate/2nd/8.json"
then, executing it. Paralellism can be achieved by running the script in multiple processes/machines. Save the
output to text files named out-*.txt
(with a number instead of *
) in the same folder, then run
analyze.py
to extract win rates. The runner script can take up to several days, and the analyze script can take up to some hours.
See the trained_models
package for more information on the predictor script.
-
Vieira, R., Chaimowicz, L., Tavares, A. R. (2020). Drafting in Collectible Card Games via Reinforcement Learning. Master's thesis, Department of Computer Science, Federal University of Minas Gerais, Belo Horizonte, Brazil.
-
Vieira, R., Tavares, A. R., Chaimowicz, L. (2020). Drafting in Collectible Card Games via Reinforcement Learning. 19th Brazilian Symposium of Computer Games and Digital Entertainment (SBGames).