Support upcoming larger "b28" nets and lots of bugfixes
This release is outdated, see https://github.com/lightvector/KataGo/releases/tag/v1.15.0 for a newer release!
Note for CUDA and TensorRT: starting with this release newer versions are required!
- The CUDA version requires CUDA 12.1.x and CUDNN 8.9.7. CUDA 12.1.1 in particular was used for compiling and testing. For CUDA, using a more recent version should work as well. Older versions might work too, but even if they do work, upgrading from a much older version might give a small performance improvement.
- The TensorRT version requires precisely CUDA 12.1.x and TensorRT 8.6.1 ("TensorRT 8.6 GA"). CUDA 12.1.1 in particular was used for compiling and testing.
- Note that CUDA 12.1.x is used even though it is not the latest CUDA version because TensorRT does not yet support CUDA 12.2 or later! So for TensorRT, the CUDA version must not be upgraded beyond that.
Summary and Notes
This release adds upcoming support for a larger and stronger "b28" neural net that is currently being trained and will likely be ready within the next couple of months! This release also fixes a lot of minor bugs and makes a lot of minor improvements.
As a reminder, see here for a special neural net better than any other net on 9x9, which was used to generate the 9x9 opening books at katagobooks.org.
Available below are both the standard and "bs29" versions of KataGo. The "bs29" versions are just for fun, and don't support distributed training but DO support board sizes up to 29x29. They may also be slower and will use much more memory, even when only playing on 19x19, so use them only when you really want to try large boards.
The Linux executables were compiled on a 20.04 Ubuntu machine. Some users have encountered issues with libzip or other library compatibility issues in the past. If you have this issue, you may be able to work around it by compiling from source, which is usually not so hard on Linux, see the "TLDR" instructions for Linux here.
Changes in v1.14.0
New features
- Added support for a new "v15" model format that adds a nonlinearity to the pass policy head. This change is required for the new larger b28c512nbt neural net that should be ready in the next few months and might become the strongest neural net to use for top-tier GPUs.
Engine improvements
- KataGo analysis mode now ignores history prior to the root (except still obeying ko/superko)! This means analysis will no longer be biased by placing stones in an unrealistic ordering when setting up an initial position, or exploring game variations when both players play very bad moves. Pre-root history is still used when KataGo is playing rather than analyzing because it is presumed that KataGo played the whole game as the current player and chose the moves it wanted - if this is not true, see
analysisIgnorePreRootHistory
andignorePreRootHistory
in the config. - Eigen version of KataGo now shares the neural net weights for all threads instead of copying it - this should greatly reduce memory usage when running with multiple threads/cores.
- TensorRT version of KataGo now has a cmake option
USE_CACHE_TENSORRT_PLAN
for custom compiling that can give faster startup times for TensorRT backend at the cost of some disk space (thanks to kinfkong). Do NOT use this for self-play or training, it will use excessive disk space over time and increase the cost of each new neural net. The ideal use case is using only one or a few nets for analysis/play over and over.
Main engine bugfixes
- Fixed bug where KataGo would not try to claim a win under strict scoring rules when forced to analyze a position past when the game should have already ended, and would assume the opponent would not either.
- Fixed bad memory access that might cause mild bias to behavior in filling dame in Japanese rules.
- Fixed issue where when contributing selfplay games to distributed training, if the first web query to katagotraining.org fails the entire program would fail instead of retrying the query like it would retry any web queries thereafter.
- Fixed some multithreading races - avoid any copying of child nodes between arrays during search.
- Fixed bug in parsing certain malformed configs with multiple GPUs specified.
- Fixed bug in determining the implicit player to move on the first turn of an SGF with setup stones.
- Fixed some bugs in recomputing root policy optimism when differing from tree policy optimism in various cases, or when softmax temperature or other parameters differ after pondering.
- Fixed some inconsistencies in how Eigen backend number of threads was determined.
- Shrank the default batch size on Eigen backend since batching doesn't help CPUs much, should make more efficient use of cores with fewer threads now.
- Minor internal code cleanups involving turn numbers, search nodes, and other details. (thanks nerai)
Expert/dev tool improvements
- Tools
- Added
bSizesXY
option to control exact board size distribution including rectangles for selfplay or match commands, instead of only an edge length distribution. See match_example.cfg. - Improved many aspects of book generation code and add more parameters to it that were used for the 9x9 books at katagobooks.org
- The python
summarize_sgfs.py
tool now outputs stats that can identify rock-paper-scissors situations in the Elos. - Added experimental support for dynamic komi in internal test matches.
- Various additional arguments and minor changes and bugfixes to startpos/hintpos commands.
- Added
- Selfplay and training
- By default, training models will now use a cheaper version of repvgg-linear architecture that doesn't actually instantiate the inner 1x1 convolution, but instead adjusts weights and increases the LR on the central square of a 3x3 conv. This change only applies to newly initialized models - existing models will keep the old and slower-training architecture.
- Modernized all the various outdated selfplay config parameters, added readme for them
- Minor (backwards-compatible) adjustments to training data NPZ format, made to better support experimental conversion of human games to NPZ training data.
- Improve shuffle.py and training.py defaults and -help documentation. E.g.
cd python; python shuffle.py -help
. - Various other minor updates to various docs
- Improve and slightly rearrange synchronous loop logic
Expert/dev tool bugfixes
- Fixed wouldBeKoCapture bug in python board implementation.
- Fixed bug where trainingWeight would be ignored on local selfplay hintposes.
- Now clears export cycle counter when migrating a pytorch model checkpoint to newer versions.
- Fixed minor bugs updating selfplay file summarize and shuffle script args.
- Various other minor bugfixes to dev commands and python scripts for training.