Skip to content

Supplemental material for my strength model based on KataGo (master thesis)

License

Unknown and 2 other licenses found

Licenses found

Unknown
LICENSE
GPL-3.0
LICENSE-GPLv3
MIT
LICENSE-MIT
Notifications You must be signed in to change notification settings

Animiral/go-strength-model

Repository files navigation

This repository contains scripts, utilities and material for my strength model based on KataGo, which is the subject of my master thesis.

The strength model is a neural network model which uses the existing KataGo network to interpret Go positions and moves. It uses the internal result representation of KataGo as input in its own architecture to predict players' strength rating from recent moves played. This document gives you step-by-step instructions for training and running the strength model.

Python scripts are located in the python subdirectory. By convention, we dump CSV files in the csv subdirectory, which you may need to create first.

External Resources

The following external dependencies are required:

Estimate Playing Strength

Using the modified KataGo and a strength model weights file, we can estimate a player's Glicko rating. The strength model weights file can be obtained, for example, by following the further steps in this README to train it on an existing game dataset.

The strength model is implemented in Python using PyTorch, but it requires the modified katago binary to extract move features from the SGFs. To pass all the required and optional arguments, follow this script.

$ SGF=path/to/games/*.sgf
$ KATAGO=path/to/katago/cpp/katago
$ KATAMODEL=path/to/katago/models/kata1-b18c384nbt-s6582191360-d3422816034.bin.gz
$ KATACONFIG=path/to/katago/cpp/configs/analysis_example.cfg
$ STRMODEL=path/to/weights/file.pth
$ FEATURENAME=pick  # or trunk, or head, if compatible with model
$ PLAYERNAMEARG=--playername \"My Name\"  # needs to match player name in SGFs
#$ PLAYERNAMEARG=   # uncomment this to auto-detect name

python3 python/model/run.py $SGF --katago $KATAGO --katamodel $KATAMODEL --kataconfig $KATACONFIG \
    --model $STRMODEL --featurename $FEATURENAME $PLAYERNAMEARG

Dataset Preparation

We start by preparing the games which we want to use in training. We assume that these games exist as a collection of SGF files found under some common root directory on your disk.

Filtering Games

The extractor program provided in this repository searches through an archive of SGF files for suitable training games. Every eligible file is extracted to the dataset directory to a file path constructed from the game date specified in the SGF and the names of the players. Additionally, all SGF paths are printed to a CSV output file. Suitable games are no-handicap even 19x19 games with more than 5 seconds per move to think, have at least 20 moves played, no passes before move 50, were decided by either counting, resignation or timeout, and contain the string "ranked" (and not "unranked") in the GC property.

The extractor must be compiled from its C++ sources, located in this repository in the extractor subdirectory.

$ pushd extractor
$ cmake .
$ make
$ popd

Start the program with the archive file (containing SGFs), extraction base directory path and CSV output path as arguments. If those arguments are not provided, extractor explains itself.

$ mkdir dataset
$ extractor/extractor sgfs.tar.gz dataset csv/games.csv

As an alternative, this project also offers a script that just builds the CSV file from all eligible SGFs in a given directory and subdirectories, and another utility to filter a game list in CSV format. See "Filtering Games (alternative)" section. The advantage of extractor is that it is fast and there is no need to extract a large dataset, including undesirable SGFs, to disk. Beyond that, extractor extracts everything using file names with characters [a-zA-Z0-9_-. ] only, for better compatibility even if the players' names include characters not allowed in the target filesystem. On top of that, it features hacks to properly read some broken player names from the OGS 2021 dataset specifically. The namecompat utility bundled in the extractor directory can perform just the name extraction with corrections as its own step.

Judging Games

In this optional step, we override the specified winner of each game in the list with whoever held the advantage at the end in the eyes of KataGo. The goal is to improve the quality of the training data. In reality, games are often won by the player in the worse position. This can happen if their time runs out, if they feel lost and resign, or especially among beginners, the game reaches the counting stage and is scored wrong by the players. By eliminating these factors, we concentrate on the effectiveness of the moves played.

The forked KataGo repository contains the script judge_gameset.py, which can read our prepared games.csv and output a new list with predicted winners.

$ KATAGO=path/to/katago/cpp/katago
$ CONFIG=path/to/katago/cpp/configs/analysis_example.cfg
$ MODEL=path/to/katago/models/kata1-b18c384nbt-s6582191360-d3422816034.bin.gz
$ LIST=csv/games.csv
$ OUTLIST=csv/games_judged.csv
$ FLAGS=""  # FLAGS=--keep-undecided --max-visits 50
$ python3 ~/source/katago/python/judge_gameset.py $FLAGS --katago-path $KATAGO --config-path $CONFIG --model-path $MODEL -i $LIST -o $OUTLIST

The script copies all columns from the input except for Winner. The new winner is noted in the Score column of the output file, with a value of 1 if black wins, 0 if white wins, and 0.5 if the game cannot be decided. Undecided games are omitted from the output unless you pass the flag --keep-undecided to the script. The depth of evaluation can be modified with the --max-visits argument, which passes through to KataGo. If the process of judging games should be interrupted, the script can resume from any point. If the output file exists prior to running, all SGF names in it are excluded from being run through KataGo and new results are appended to the file.

For a dataset of 7M+ games, even with an optimized KataGo build with TensorRT backend, using a b18c384nbt network and --max-visits 10, running on an RTX 4070 GPU, this can take over 40 hours total to run.

Splitting the Dataset

The script random_split.py reads a CSV file and adds or modifies the "Set" column, which marks a number of rows as a member in one of three sets: "T" for the training set, "V" for the validation set, "E" for the test set and "X" for the exhibition set. Rows not in any set are marked with "-". The markers are distributed randomly, either as a proportion relative to the whole dataset if the user-defined "part" parameters are <1, or to an absolute number of rows given in the parameters if they are >=1.

The motivation behind assigning rows to sets instead of splitting the entire match pool is that if we just form distinct pools from the original one, we tear apart player's rating histories, depriving our algorithms of the data from which they derive their predictions. Instead, we keep them in the same pool. In the training process, we train only on training matches and test only on test matches, while the combined match data is available in the rating history. This technique stems from link prediction problems in social networks, where random test edges are removed from the full graph and later predicted by the model trained on the remaining edges.

Run the set assignment script as follows.

$ python3 python/random_split.py --input csv/games_judged.csv --output csv/games_judged.csv --trainingPart 10000 --validationPart 5000 --testPart 5000 --exhibitionPart 5000

This will allocate 10000 rows to the training set, 5000 to the validation set, 5000 to the test set and 5000 to the exhibition set. Any remaining rows are left unassigned, but still part of the dataset, forming the players' histories and acting as a source of recent moves. Just the model will not be trained or tested on these data points. Because all games with a set marker (more specifically, their recent move sets) must be preprocessed through the KataGo network, it is not feasible to mark millions of games for training.

If not specified, the --output file defaults to the same as the --input, overwriting it with the added information.

Rows that introduce a specific player for the first time in the dataset are generally not eligible for marking as any set, because these rows do not offer the necessary prior information for a model to predict the match outcome. The optional --with-novice switch disables this behavior, making all rows eligible for inclusion in one of the sets.

The script can also check for noisy rows and not mark these as training rows. Noisy rows are rows where both players do not have a minimum number of future games in the dataset, meaning that their label does not have future information from the Glicko-2 system and might be less accurate. The number of required future games is specified with the optional --advance parameter, just like in the section "Labeling Games" below. A row is also noisy if the labels disagree with the score (outcome). I.e. black wins against higher-rated white or vice-versa. The noise criteria should only be applied to a labeled dataset (see below) and thus are only used if --advance is specified.

A row qualifies to be in the exhibition set if neither the black nor the white player have more than 4 games of past history at the point of the match.

With the optional --modify switch, the existing set assignment will be kept as far as possible, keeping changes to a minimum.

As an alternative usage, the splits can be specified as fractions. Omitting --testPart assigns all remaining rows to the test set.

$ python3 python/random_split.py --input csv/games_judged.csv --output csv/games_judged.csv --trainingPart 0.8 --validationPart 0.1

This will allocate 80% of eligible rows to the training set, 10% to the validation set and the remaining 10% to the test set.

Once allocated, the script can also copy the same set markers to a different CSV file, as long as the "copy-from" file has both "File" and "Set" headers and holds the information on every "File" listed in the input CSV file:

$ python3 python/random_split.py --input csv/games_judged.csv --copy csv/games_labels.csv

Glicko-2 Calculation

We feed our dataset(s) into our reference rating algorithm Glicko-2, which is implemented for OGS in the goratings repository. It contains the script analyze_glicko2_one_game_at_a_time.py. The forked repository is extended to read input from our games list and SGF files, and to produce an output list that contains the results of the rating calculation after every game. From our list file, the script expects the file name in the first column and the score in the last column. This is the case in games_judged.csv from above (before adding the set marker!).

$ GORATINGS_DIR=path/to/goratings
$ PYTHONPATH="$PYTHONPATH:$GORATINGS_DIR" python3 $GORATINGS_DIR/analysis/analyze_glicko2_one_game_at_a_time.py \
	--sgf csv/games_judged.csv --analysis-outfile csv/games_glicko_ids.csv --mass-timeout-rule false

Since the scripts in goratings use integer IDs for games and players, we need to run our name_ratings.py script to restore SGF paths and player names.

$ python3 python/name_ratings.py --list csv/games_judged.csv --ratings csv/games_glicko_ids.csv --output csv/games_glicko.csv

We make use of Glicko-2 ratings in two ways. First, it provides the after-game ratings BlackRating and WhiteRating as the basis for future Glicko ratings labels, which our models are trained and evaluated on, see Training section below. Second, it is our reference rating system. The PredictedScore, PredictedBlackRating and PredictedWhiteRating columns allow us to measure the performance of Glicko-2 itself on our dataset.

Dataset Viewer

datasetviewer is a utility program that allows us to query data from the dataset. It must be compiled from its C++ sources, located in this repository in the datasetviewer subdirectory.

$ cd datasetviewer
$ cmake .
$ make

Start the program with the dataset list file (generated following the previous sections) and feature cache directory as arguments. If those arguments are not provided, datasetviewer explains itself.

Usage: datasetviewer LIST_FILE FEATURE_DIR
  View data from the games in the LIST_FILE, with precomputed features stored in FEATURE_DIR.
Commands:
  help                                  Print this help.
  info                                  Print active settings and filters.
  exit                                  Exit program.
  select TOPIC FILTER OP VALUE|RANGE    Set the filter for TOPIC.
    TOPIC choices: games|moves
    FILTER choices for games: none|#|file|black|white|score|predscore|set
    FILTER choices for moves: none|recent|color
    OP choices: in|contains
  configure SETTING VALUE               Configure a global setting.
    SETTING choices: window
  print TOPIC COLUMN...                 Write the values to stdout.
  dump FILE TOPIC COLUMN...             Write the values to FILE.
    TOPIC choices: games|moves
    COLUMN choices for games: #|file|black.name|white.name|black.rating|white.rating|black.rank|white.rank|score|predscore|set
    COLUMN choices for moves: #|color|winprob|lead|policy|maxpolicy|wrloss|ploss|rating

For example, in the following session, we extract the first 100 matchups, and also recent move data for game 11 in the dataset.

$ VIEWERDIR=datasetviewer
$ LIST=csv/games_labels.csv
$ FEATUREDIR=featurecache
$ $VIEWERDIR/datasetviewer $LIST $FEATUREDIR
Dataset Viewer: 1890 games read from games_labels.csv (with features), ready.
> select games # in 0-99
Ok.
> dump matches_100.csv games black.name black.rating white.name white.rating
Write to matches_100.csv...
Done.
> select moves recent in 11
Ok.
> dump recent_11.csv moves color policy ploss
Write to recent_11.csv...
Done.
> exit 
Dataset Viewer: bye!
$ 

Training

Using the dataset as prepared above, we can train the strength model on it – either from scratch, or by loading an existing model file. The strength model is implemented in Python using PyTorch. It requires that the input game(s) are preprocessed through the KataGo network to obtain the strength model input features for every move. TODO: unify this process in one script

Labeling Games

One way to train our strength model is to let it predict the players' future rating number. The label_gameset.py script provided in this repository reads the list of games that we produced in the previous steps. The output games list contains the future rating of both the black and the white player involved, from the point when they have played an additional number of games as specified in the --advance argument.

For example, if Alice starts with a rating of 1000 and then wins against B, C, D, E and F, resulting in ratings of 1100, 1200, 1300, 1400 and 1500 respectively, and the --advance option is set to 3 (games in the future), then the resultant labeling might be:

File,Player White,Player Black,Score,WhiteRating,BlackRating
Alice_vs_B.sgf,Alice,B,0,1300,1000
Alice_vs_C.sgf,Alice,C,0,1400,1000
Alice_vs_D.sgf,Alice,D,0,1500,1000
E_vs_Alice.sgf,E,Alice,1,1000,1500
Alice_vs_F.sgf,Alice,F,0,1500,1000

Run the script as follows.

$ python3 python/label_gameset.py --list csv/games_glicko.csv --output csv/games_labels.csv --advance 10

Precomputation

The strength prediction for a player is based on a large number of recent moves, every one of which must be evaluated by the KataGo network to find its features.

The precomputation of recent move features is built into my fork of KataGo in the form of the new command extract_features. This extractor currently supports three types of features:

  • Head features are derived from the usual output of the KataGo network. They include winrate loss and points loss, which require evaluating both the position before and after the move in question. We use these features in our basic stochastic model and as data to train our proof-of-concept model.
  • Trunk features simply contain the entire trunk output of the KataGo network from the board state of the recent move. They take too much time and generate too much data for our dataset with 20000 marked rows (see Splitting the Dataset above).
  • Pick features contain the feature vector from the trunk output associated with the board location where the stone was placed. We use these features to train our full model.

Invoke the extract_features command as follows:

$ KATAGO=path/to/katago
$ MODEL=path/to/katago/models/kata1-b18c384nbt-s6582191360-d3422816034.bin.gz
$ CONFIG=path/to/configs/analysis_example.cfg
$ LIST=csv/games_labels.csv
$ FEATUREDIR=path/to/featurecache
$ SELECTION="-with-trunk -with-pick -with-head"  # feature categories to precompute
$ WINDOWSIZE=500   # number of recent moves per game and player
$ BATCHSIZE=10     # number of board positions sent to KataGo in one batch
$ BATCHTHREADS=8   # number of concurrent worker threads launching independent NN batches to the GPU
$ WORKERTHREADS=8  # number of concurrent CPU workers to prepare NN inputs and process NN outputs

$ katago extract_features $SELECTION -model $MODEL -config $CONFIG -list $LIST -featuredir $FEATUREDIR \
                          -window-size $WINDOWSIZE -batch-size $BATCHSIZE -batch-threads $BATCHTHREADS \
                          -worker-threads $WORKERTHREADS

KataGo reads the given list file and determines the recent move set for every marked game in the dataset. It uses its network to evaluate all the necessary positions from which it derives the features specified in the selection parameters. The features associated with the recent moves are grouped by the game in which they occur and the player's color, then written to a zip file on disk stored in the feature cache directory. The file name ends in Features.zip. Once all recent move features are avialable per recent game, it combines them into recent move set archives with file names based on the game and player color that these moves are recent for and stores them in the feature cache directory. The file name ends in Recent.zip. These are our data sources for model training.

For example, if game 110 between players P and Q is in the training set, where P has recent moves in games 101, 105 and 107, while Q has recent moves in games 102 and 106, then the extractor creates Features.zip archives for games 101, 102, 105, 106 and 107. These combine into one Recent.zip archive for each side in game 110.

This tool supports two more command line switches. The -recompute switch causes the program to overwrite any files left over from previous runs. Omit it to continue precomputation from the state of the previous, unfinished run. The -print-recent-moves switch outputs some debug information (not just recent moves).

This step can be very time and resource intensive, especially with large datasets (multiple 10k marked games), large window size, large batch size and many threads. Try different parameters to find an acceptable balance between speed and resource use on your system. In case of crashes due to resource exhaustion, the process can be resumed without the -recompute switch.

After successful completion, check the result using checkdataset.py found in this repository:

$ LIST=csv/games_labels.csv
$ FEATUREDIR=path/to/featurecache
$ SELECTION="--require-trunk --require-pick --require-head"  # features to check
$ python3 python/checkdataset.py $LIST $FEATUREDIR $SELECTION

This script tallies up all marked games in every set and look at their recent move zip files. If required files or features in them are missing or of inconsistent size, the affected game is listed in an error summary at the end. Ideally, the end of its output should look like this, with desired feature sets "all" present:

Dataset checked, 10000 train/5000 validation/5000 test, 0 errors.
  head features? all
  pick features? all
  trunk features? none

The Training Command

The train.py file included in this repository can be launched as a standalone script to train the strength model on the above precomputed data. Invoke it from the shell like this:

LIST=csv/games_labels.csv
FEATUREDIR=path/to/featurecache
FEATURENAME=pick
OUTFILE=nets/model{}.pth
TRAINLOSSFILE=logs/trainloss.txt
VALIDATIONLOSSFILE=logs/validationloss.txt
LOGFILE=logs/traininglog.txt

BATCHSIZE=100
STEPS=100
EPOCHS=100
LEARNINGRATE=0.001
LRDECAY=0.95
PATIENCE=3

WINDOW_SIZE=500
DEPTH=5
HIDDEN_DIMS=64
QUERY_DIMS=64
INDUCING_POINTS=32

python3 -u python/model/train.py $LIST $FEATUREDIR --featurename $FEATURENAME --outfile "$OUTFILE" \
  --trainlossfile $TRAINLOSSFILE --validationlossfile $VALIDATIONLOSSFILE \
  --batch-size $BATCHSIZE --steps $STEPS --epochs $EPOCHS --learningrate $LEARNINGRATE --lrdecay $LRDECAY --patience $PATIENCE \
  --window-size $WINDOW_SIZE --depth $DEPTH --hidden-dims $HIDDEN_DIMS --query-dims $QUERY_DIMS --inducing-points $INDUCING_POINTS

Please keep in mind that relative SGF paths in LISTFILE must be relative to the current working directory. The LISTFILE must contain the "Set" column from the labeling step. The script uses 'T' (training) rows for training and 'V' (validation) rows to check performance. After every epoch, the trained network weights are saved in a separate file according to the pattern given as OUTFILE. The epoch number takes the place of the placeholder {} in the final name.

Hyperparameter Optimization

The training method as specified in the thesis uses a random search for the best hyperparameters. This process is handled by the script hpsearch.py in this repository. Invoke it as follows.

LIST=csv/games_labels.csv
FEATUREDIR=path/to/featurecache
FEATURENAME=pick
TITLE=search
NETDIR=nets
LOGDIR=logs
BATCHSIZE=100
STEPS=100
EPOCHS=100
PATIENCE=3
SAMPLES=15
BROADITERATIONS=2
FINEITERATIONS=2

python3 -u python/model/hpsearch.py $LIST $FEATUREDIR --featurename $FEATURENAME --title "$TITLE" \
  --netdir $NETDIR --logdir $LOGDIR \
  --batch-size $BATCHSIZE --steps $STEPS --epochs $EPOCHS --patience $PATIENCE \
  --samples $SAMPLES --broad-iterations $BROADITERATIONS --fine-iterations $FINEITERATIONS

The same notes regarding LISTFILE apply as in the Training Command section above.

Evaluation

Given a CSV rating calculation file with the required columns, our script calc_performance.py calculates the relevant metrics of the rating system which produced the input file. The required columns are:

  • Score and PredictedScore
  • Player Black and Player White, to distinguish between first-timers and players with information attached
  • Set (optional), to calculate only on set T, V or E (e.g. V for validation set)

The primary measurable quality of a rating system is determined by its ability to predict the winners of matchups as they happen. When the higher-rated player beats the lower-rated player, the system was successful. Moreover, we value not just the number of successfully predicted matchups, but also the degree of the prediction. The higher the prior rating of the eventual winner compared to the the loser, the more performant our system. We measure the success rate as the number of successful predictions divided by the total number of matches. We measure the performance as the average of log-likelihoods of every outcome prediction (logp). This is the log-likelihood of all the outcomes happening as they did according to the rating system based on the prior information at the time, scaled with dataset size for better comparison.

Given a rating calculation file with the above columns, the script calc_performance.py tells us the success rate and log-likelihood achieved by the system.

$ python3 python/calc_performance.py csv/games_glicko.csv -m V

In this example, we evaluate the output file of the Glicko-2 Calculation above, establishing the performance of our reference system. In the following sections, we obtain evaluation files for our own models.

Stochastic Model Calculation

The Stochastic Model is a simple idea that we can predict winning chances based on the expected points loss of both players in their match. It is implemented in the script stochasticmodel.py.

$ LISTFILE=csv/games_labels.csv
$ FEATUREDIR=path/to/featurecache
$ MARKER=V
$ OUTFILE=csv/games_stochastic_$MARKER.csv
$ python3 python/stochasticmodel.py $LISTFILE $FEATUREDIR -m $MARKER $OUTFILE

This model requires precomputed head features for all marked records. The output file contains the predicted game outcomes for feeding into the performance calculation script.

Strength Model Calculation

Once the strength model is trained, we can apply it to a dataset by invoking the script eval.py.

$ LIST=csv/games_labels.csv
$ FEATUREDIR=path/to/featurecache
$ MODELFILE=path/to/strengthmodel.pth
$ OUTFILE=csv/games_strmodel.csv

python3 python/model/eval.py "$LIST" "$FEATUREDIR" "$MODELFILE" --outfile "$OUTFILE" --setmarker V

The output file contains the games from the list that match the given set marker, extended by new columns for predicted ratings and score.

Usually the rating predictions are scaled to match the Glicko-2 labels of the training set, but with the --raw option, the predictions are left exactly as they are coming from the network. This is required info to determine the scaling factors for the public web app, where we fit the model output to the scale observed on the OGS site.

Plots

Visual presentations of the data found in the thesis are created using scripts in the plots subdirectory. Consult the associated HowTo for reproduction steps.

Web Application "How Deep Is Your Go?"

As part of this project, a small web application is provided for the public to use the strength model. This web app uses the flask library. Its implementation is found in the python/webapp subdirectory.

To run, it requires the following environment variables:

KATAGO=path/to/katago  # binary
KATAMODEL=path/to/katago/models/kata1-b18c384nbt-s6582191360-d3422816034.bin.gz
KATACONFIG=path/to/configs/analysis_example.cfg
STRMODEL=path/to/strengthmodel.pth
PYTHONPATH="go-strength-model/python:go-strength-model/python/model"

Where go-strength-model is the path to the directory of this repository.

The application can run using flask's integrated development web server: python3 python/webapp/webapp.py, or using a separate web server. The Dockerfile in the root of this repository builds a complete image for the web app with all requirements, ready to be hosted.

Trickplay Evaluation

The thesis contains an evaluation of trickplay variations from the book “Tricks in Joseki”. The raw input for this task is located in the trickplay subdirectory. To evaluate the positions given as SGF files requires the following steps:

  1. Since the strength model works on whole positions and not problem diagrams, we must find plausible board completions for them. In our case, we take these complete positions from the OGS dataset. a. Precompute the initial positions of the problem SGFs into simple board string files. The script that does this is python/trickboards.py in this repository. Example usage: python3 python/trickboards.py extract trickplay. b. Some patterns require manual adjustments.
    • Problem 21 contains a captured stone. Since our pattern detection method does not work with captures, replace the captured location with the placeholder '?' in the board string file.
    • Problems 24, 40, 43 and 47 do not occur in the dataset. To work around this, remove the last white move from the pattern and re-add it to the complete board. c. Search the dataset for positions matching the board strings. The script python/findpatterns.py accomplishes this task. Example usage: python3 python/findpatterns.py csv/games_labels.csv trickplay/board*.txt. d. The completed problem 61 requires another manual adjustment, with editor's judgement, because the successful refutation depends on a global ladder.
  2. Combine the generated boards with the original SGFs to get success and failure variation SGFs. Example usage: python3 python/trickboards.py merge trickplay. This script requires adherence to the naming convention problem99.sgf for problem patterns, and completed99.txt for the corresponding completed board string file from the previous step.
  3. Extract move features for the black moves in the success and failure variations. See below for command.
  4. Run the strength model on the move features for every problem. See below.

The commands to extract the move features for black follow this template:

KATAGO=path/to/katago
MODEL=path/to/katago/models/kata1-b18c384nbt-s6582191360-d3422816034.bin.gz
CONFIG=path/to/configs/analysis_example.cfg
SGFS="trickplay/success*.sgf trickplay/failure*.sgf"

for SGF in $SGFS ; do
  OUTFILE="${SGF/.sgf/_BlackFeatures.zip}"
  $KATAGO extract_sgfs $SGF -model $KATAMODEL -config $CONFIG -outfile $OUTFILE -color Black -window-size 500 -with-pick
done

The commands to run the strength model on the extracted features follow this template:

ZIPS="trickplay/success*_BlackFeatures.zip trickplay/failure*_BlackFeatures.zip"
MODELFILE=path/to/strengthmodel.pth
FEATURENAME=pick

export PYTHONPATH="python"
python3 python/model/run.py $ZIPS --model $STRMODEL --featurename $FEATURENAME

Miscellaneous

Rating Scale Fitting

In order to match public expectations for the determination of ratings by the web app, we establish a distinct scaling factor for the full strength model that minimizes error towards ratings on OGS.

To achieve this, we first run a --raw evaluation of the training set, as described in the section “Strength Model Calculation”. Refer to the following command:

python3 python/model/eval.py csv/games_labels.csv featurecache nets/search/model.pth --raw --featurename pick --outfile csv/games_raweval.csv --setmarker T

We only fit a sample of 100 games. This is plenty considering the noise inherent in rating estimation. The 100 games are selected at random.

head -n 1 csv/games_raweval.csv > csv/games_raw100.csv
tail -n +2 csv/games_raweval.csv | shuf -n 100 >> csv/games_raw100.csv
csvcut -c File,PredictedBlackRating,PredictedWhiteRating csv/games_raw100.csv > csv/games_raw100s.csv

This step requires csvcut, to be installed separately.

The script ogsratings.py can use this file to automatically request the historical rating numbers from OGS.

python3 python/ogsratings.py csv/games_raw100s.csv csv/games_rawogs.csv

From this, we use the script ogsfit.py to determine the scaling factors a and b, such that for model outputs x, the mean squared error between a * x + b and OGS ratings is minimal.

python3 python/ogsfit.py csv/games_rawogs.csv

Our result is a = 334.0281191511932 and b = 1595.094753057906.

Filtering Games (alternative)

The sgffilter.py script provided in this repository traverses a given directory and all its subdirectories for SGF files. Every file that contains a suitable training game is printed to the output file. Suitable games are no-handicap even 19x19 games with more than 5 seconds per move to think, have at least 20 moves played, were decided by either counting, resignation or timeout, and contain the string "ranked" (and not "unranked") in the GC property.

$ python3 python/sgffilter.py path/to/dataset more/paths/to/datasets --output csv/games.csv

Another alternative is the C++ utility sgffilter in the extractor directory. It takes an input CSV list of games like games.csv and writes to an output CSV list all those games which meet even stricter filter criteria: games with passes or nowhere-moves before move 50 are discarded.

$ pushd extractor
$ cmake .
$ make
$ popd
$ extractor/sgffilter csv/games.csv csv/games_refiltered.csv

Recent Moves Precomputation

This script precomputes, for every game in the dataset, for both the black and white side, which games contain their "recent moves". These are the moves that the strength model may use to predict the outcome of that game. We want to use the training, validation and test sets, so we have to run the script for each of them.

$ LISTFILE=csv/games_judged.csv
$ FEATUREDIR=path/to/featurecache
$ python3 python/recentmoves.py "$LISTFILE" "$FEATUREDIR" --marker T
$ python3 python/recentmoves.py "$LISTFILE" "$FEATUREDIR" --marker V
$ python3 python/recentmoves.py "$LISTFILE" "$FEATUREDIR" --marker E

Determine Strength Model Error

This script produces data to estimate the error of the strength model for every different size of input set. Model outputs are compared to the game labels, and this is done taking just the latest i recent moves up to the window size N. It writes the raw data to the optional file given in the samplespath argument and prints the estimated variance and standard deviation for each move count to stdout.

$ python3 python/model/deviation.py csv/games_labels.csv featurecache nets/search2/model.pth --samplespath logs/modeldeviation.txt

Stochastic Model Calculation Implemented in C++

TThis implementation of the Stochastic model is obsolete. The Stochastic Model is a simple idea that we can predict winning chances based on the expected points loss of both players in their match. It is implemented in the modified KataGo (needs to be compiled from my fork, see above) with the rating_system command.

$ CONFIG=configs/analysis_example.cfg
$ LISTFILE=csv/games_judged.csv
$ OUTFILE=csv/games_stochastic.csv
$ FEATUREDIR=path/to/featurecache
$ katago rating_system -config $CONFIG -list $LISTFILE -outlist $OUTFILE -featuredir $FEATUREDIR -set V

The -featuredir is mandatory and must hold the precomputed extracted move features for every game. These must be prepared by extract_features as outlined above.

The output file contains the results of the rating calculation, directly comparable to the output of the Glicko-2 analysis script above.

C++ Tests

The modified katago features new tests for the new functionality.

$ katago runstrengthmodeltests

About

Supplemental material for my strength model based on KataGo (master thesis)

Resources

License

Unknown and 2 other licenses found

Licenses found

Unknown
LICENSE
GPL-3.0
LICENSE-GPLv3
MIT
LICENSE-MIT

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published