From 0c46fa25d381f19fad062e04366f19c72255b9b0 Mon Sep 17 00:00:00 2001 From: Jonathan Rosenthal Date: Sat, 21 Sep 2019 01:07:06 -0400 Subject: [PATCH] Contempt (#26) * Contempt * Regular compile no longer contains search param settings used for tuning * Updated README.md * Updated version number --- README.md | 28 ++++++++++++++--- src/general/settings.h | 2 +- src/net_evaluation.cc | 52 +++++++++++++++++++++++++++----- src/net_evaluation.h | 4 +++ src/search.cc | 68 ++++++++++++++++++++++++++++++++---------- src/search.h | 5 ++++ src/uci.cc | 45 +++++++++++++++++++++------- 7 files changed, 165 insertions(+), 39 deletions(-) diff --git a/README.md b/README.md index 8096d94..0ef5c12 100644 --- a/README.md +++ b/README.md @@ -9,23 +9,43 @@ Winter has relied on many machine learning algorithms and techniques over the co As of Winter 0.6.2, the evaluation function relies on a small neural network for more precise evaluations. ## Installation -In order to run it on Linux, just compile it via "make" in the root directory and then run it from the root directory. Tested with clang (recommended) and gcc (default). +In order to run it on Linux, just compile it via "make" in the root directory and then run it from the root directory. -When running Winter from command line, be sure to call "uci" to get information about the current version, including its number and the detected architecture. +The makefile will assume you are making a native build, but if you are making a build for a different system, it should be reasonably straightforward to modify yourself. Winter does not rely on any external libraries aside from the Standard Template Library. All algorithms have been implemented from scratch. As of Winter 0.6.2 I have started to build an external codebase for neural network training. +## Contempt +Winter versions 0.7 and later have support for contempt settings. In most engines contempt is used to reduce the number of draws and thus increase performance against weaker engines, often at the cost of performance in self play or against stronger opposition. + +Winter uses a novel contempt implementation that utilizes the fact that Winter calculates win, draw and loss probabilities. Increasing contempt in Winter reduces how much it values draws for itself and increases how much it believes the opponent values draws. + +#### Centipawn output recalibration + +Internally Winter actually tends to have a negative score for positive contempt and vice versa. This is natural as positive contempt is reducing the value of a draw for the side to move, so the score will be negatively biased. + +In order to have more realistic score outputs, Winter does a bias adjustment for non-mate scores. The formula assumes a maximum probability for a draw and readjusts the score based on that. This results in an overcorrection. Ie: positive contempt values will result in a reported upper bound score and negative contempt values will result in a reported lower bound score. + +For high contempt values it is recommended to adjust adjudication settings. + +#### Armageddon +An increasingly popular format in human chess is Armageddon. Winter is the first engine to the author's knowledge to natively support Armageddon play as a UCI option. Internally this works by setting contempt to high positive value when playing white and a high negative value when playing black. + +At the moment contempt is not set to the maximum in Armageddon mode. In the limited testing done this proved to perform more consistently. This may change in the future. + +In Armageddon mode score recalibration is not performed. The score recalibration formula for regular contempt assumes the contempt pushes the score away from the true symmetrical evaluation. In Armageddon the true eval is not symmetric. + ## Training Your Own Winter Flavor At the moment training a neural network for use in Winter is only supported in a very limited way. I intend to release the script shortly which was used in order to train the initial 0.6.2 net. In the following I describe the steps to get from a pgn game database to a network for Winter. - 1. Get and compile the latest [pgn-extract](https://www.cs.kent.ac.uk/people/staff/djb/pgn-extract/) by David J. Barnes. +1. Get and compile the latest [pgn-extract](https://www.cs.kent.ac.uk/people/staff/djb/pgn-extract/) by David J. Barnes. 2. Use pgn-extract on your .pgn file with the arguments `-Wuci` and `--notags`. This will create a file readable by Winter. 3. Run Winter from command line. Call `gen_eval_csv filename out_filename` where filename is the name of the file generated in 2. and out_filename is what Winter should call the generated file. This will create a .csv dataset file (described below) based on pseudo-quiescent positions from the input games. 4. Train a neural network on the dataset. It is recommended to try to train something simple for now. Keep in mind I would like to refrain from making Winter rely on any external libraries. 5. Integrate the network into Winter. In the future I will probably support loading external weight files, but for now you need to replace the appropriate entries in `src/net_weights.h`. -6. `make clean` and `make` (or `make no_bmi`) +6. `make clean` and `make` The structure of the .csv dataset generated in 3. is as follows. The first column is a boolean value indicating wether the player to move won. The second column is a boolean value indicating whether the player to move scored at least a draw. The remaining collumns are features which are somewhat sparse. An overview of these features can be found in `src/net_evaluation.h`. \ No newline at end of file diff --git a/src/general/settings.h b/src/general/settings.h index e11811a..4c1d462 100644 --- a/src/general/settings.h +++ b/src/general/settings.h @@ -33,7 +33,7 @@ namespace settings { const std::string engine_name = "Winter"; -const std::string engine_version = "0.6.7a"; +const std::string engine_version = "0.7"; const std::string engine_author = "Jonathan Rosenthal"; #if defined(__BMI2__) diff --git a/src/net_evaluation.cc b/src/net_evaluation.cc index da464c1..949c856 100644 --- a/src/net_evaluation.cc +++ b/src/net_evaluation.cc @@ -14,11 +14,13 @@ using namespace net_features; +constexpr float kEpsilon = 0.000001; constexpr size_t block_size = 16; // The (post-) activation block size is only needed if dimension is different from preactivation //constexpr size_t act_block_size = 2 * block_size; using NetLayerType = Vec; //using CReLULayerType = Vec; +std::array contempt = { 0.5, 0.5 }; namespace { const int net_version = 19081700; @@ -32,7 +34,7 @@ float sigmoid(float x) { std::vector net_input_weights(kTotalNumFeatures, 0); NetLayerType bias_layer_one(0); -std::vector second_layer_weights(32 * 16, 0); +std::vector second_layer_weights(16 * 16, 0); NetLayerType bias_layer_two(0); //NetLayerType output_weights(0); @@ -614,9 +616,7 @@ T ScoreBoard(const Board &board) { return score; } -Score NetForward(NetLayerType &layer_one) { - constexpr float epsilon = 0.000001; - +Score NetForward(NetLayerType &layer_one, float c = 0.5) { layer_one += bias_layer_one; layer_one.relu(); // layer_one.ns_prelu(net_hardcode::l1_activation_weights); @@ -635,8 +635,9 @@ Score NetForward(NetLayerType &layer_one) { float win = layer_two.dot(win_weights) + win_bias; float win_draw = layer_two.dot(win_draw_weights) + win_draw_bias; - float wpct = (sigmoid(win) + sigmoid(win_draw)) / 2; - wpct = std::max(std::min(wpct, 1-epsilon), epsilon); + float wpct = sigmoid(win) * c + sigmoid(win_draw) * (1 - c); +// wpct = wpct * (1-kEpsilon) + 0.5 * kEpsilon; + wpct = std::max(std::min(wpct, 1-kEpsilon), kEpsilon); float output = std::log(wpct / (1-wpct)); return std::round(output * 1024); @@ -650,7 +651,7 @@ Score ScoreBoard(const Board &board) { else { layer_one = ScoreBoard(board); } - return NetForward(layer_one); + return NetForward(layer_one, contempt[board.get_turn()]); } void init_weights() { @@ -958,5 +959,42 @@ void EstimateFeatureImpact() { } } +void SetContempt(int value, Color color) { + float f = (value + 100) * 0.005; + contempt[color] = f; + contempt[color ^ 0x1] = 1-f; +} + +std::array GetDrawArray() { +// float f = contempt[0] * (1-kEpsilon) + 0.5 * kEpsilon; + float f = std::max(std::min(contempt[0], 1-kEpsilon), kEpsilon); + f = std::log(f / (1-f)); + Score res = std::round(f * 1024); + std::array result = { -res, res }; + return result; +} + +Score GetUnbiasedScore(Score score, Color color) { + Color not_color = color ^ 0x1; + float f = sigmoid(score / 1024.0); + float w, wd; + if (f == contempt[not_color]) { + return 0; + } + else if (f > contempt[not_color]) { + w = (f - contempt[not_color]) / contempt[color]; + wd = 1.0; + } + else { + w = 0.0; + wd = f / contempt[not_color]; + } + float x = (w + wd) / 2; + x = std::log(x / (1-x)); + return std::round(x * 1024); + // w * vw + wd * vwd = f + // if f > vwd: wd = 1, w = (f - vwd) / vw + // else w = 0, wd = f / vwd +} } diff --git a/src/net_evaluation.h b/src/net_evaluation.h index 3c5171c..c071ecf 100644 --- a/src/net_evaluation.h +++ b/src/net_evaluation.h @@ -47,6 +47,10 @@ void EstimateFeatureImpact(); void GenerateDatasetFromUCIGames(std::string filename, std::string out_name = "eval_dataset.csv", size_t reroll_pct = 0); +void SetContempt(int value, Color color); +std::array GetDrawArray(); +Score GetUnbiasedScore(Score score, Color color); + } // TODO: Move to external file diff --git a/src/search.cc b/src/search.cc index 4803622..54be390 100644 --- a/src/search.cc +++ b/src/search.cc @@ -62,6 +62,10 @@ int kNodeCountSampleAt = 1000; //int kNodeCountSampleEvalAt = 5000; const int kMaxDepthSampled = 32; +std::array draw_score = { 0, 0 }; +int contempt = 0; +bool armageddon = false; + int rsearch_mode; Milliseconds rsearch_duration; Depth rsearch_depth; @@ -91,10 +95,18 @@ Vec init_futility_margins(Score s) { return kFutilityMargins; } +#ifdef TUNE +Score kSNMPMargin = 588;// 587 +Array2d lmr_reductions = init_lmr_reductions(1.34);//135 +Vec kFutileMargin = init_futility_margins(1274);//900 +std::array kLMP = {0, 6, 9, 13, 18}; + +#else constexpr Score kSNMPMargin = 588;// 587 Array2d lmr_reductions = init_lmr_reductions(1.34);//135 const Vec kFutileMargin = init_futility_margins(1274);//900 const std::array kLMP = {0, 6, 9, 13, 18}; +#endif template const Depth get_lmr_reduction(const Depth depth, const size_t move_number) { @@ -611,7 +623,7 @@ Score QuiescentSearch(Thread &t, Score alpha, Score beta) { //End search immediately if trivial draw is reached if (t.board.IsTriviallyDrawnEnding()) { - return 0; + return draw_score[t.board.get_turn()];; } //TT probe @@ -802,6 +814,7 @@ Score AlphaBeta(Thread &t, Score alpha, Score beta, Depth depth, bool expected_c assert(node_type != NodeType::kPV || !expected_cut_node); const Score original_alpha = alpha; + const Score score_draw = draw_score[t.board.get_turn()]; Score lower_bound_score = kMinScore+t.board.get_num_made_moves(); //Immediately return 0 if we detect a draw. @@ -810,7 +823,7 @@ Score AlphaBeta(Thread &t, Score alpha, Score beta, Depth depth, bool expected_c if (t.board.IsFiftyMoveDraw() && t.board.InCheck() && t.board.GetMoves().empty()) { return kMinScore+t.board.get_num_made_moves(); } - return 0; + return score_draw; } //We drop to QSearch if we run out of depth. @@ -893,7 +906,7 @@ Score AlphaBeta(Thread &t, Score alpha, Score beta, Depth depth, bool expected_c if (in_check) { return kMinScore+t.board.get_num_made_moves(); } - return 0; + return score_draw; } // if (Mode == kSamplingSearchMode && node_type == NodeType::kNW && depth <= kMaxDepthSampled) { @@ -1101,12 +1114,13 @@ Score RootSearchLoop(Thread &t, Score original_alpha, Score beta, Depth current_ Score alpha = original_alpha; Score lower_bound_score = kMinScore; + const Score score_draw = draw_score[t.board.get_turn()]; //const bool in_check = board.InCheck(); - if (settings::kRepsForDraw == 3 && alpha < -1 && t.board.MoveInListCanRepeat(moves)) { - if (beta <= 0) { - return 0; + if (settings::kRepsForDraw == 3 && alpha < score_draw-1 && t.board.MoveInListCanRepeat(moves)) { + if (beta <= score_draw) { + return score_draw; } - alpha = -1; + alpha = score_draw-1; } const bool in_check = t.board.InCheck(); for (size_t i = 0; i < moves.size(); ++i) { @@ -1114,8 +1128,8 @@ Score RootSearchLoop(Thread &t, Score original_alpha, Score beta, Depth current_ t.board.Make(moves[i]); if (i == 0) { Score score = -AlphaBeta(t, -beta, -alpha, current_depth - 1); - if (settings::kRepsForDraw == 3 && score < 0 && t.board.CountRepetitions() >= 2) { - score = 0; + if (settings::kRepsForDraw == 3 && score < score_draw && t.board.CountRepetitions() >= 2) { + score = score_draw; } t.board.UnMake(); if (score >= beta) { @@ -1137,8 +1151,8 @@ Score RootSearchLoop(Thread &t, Score original_alpha, Score beta, Depth current_ if (score > alpha) { score = -AlphaBeta(t, -beta, -alpha, current_depth - 1); } - if (settings::kRepsForDraw == 3 && score < 0 && t.board.CountRepetitions() >= 2) { - score = 0; + if (settings::kRepsForDraw == 3 && score < score_draw && t.board.CountRepetitions() >= 2) { + score = score_draw; } lower_bound_score = std::max(score, lower_bound_score); t.board.UnMake(); @@ -1305,8 +1319,13 @@ void Thread::search() { << " time " << time_used.count() << " nodes " << node_count << " nps " << ((1000*node_count) / (time_used.count()+1)); if (!is_mate_score(score)) { - std::cout << " score cp " - << (score / 8); + std::cout << " score cp "; + if (armageddon) { + std::cout << (score / 8); + } + else { + std::cout << (net_evaluation::GetUnbiasedScore(score, board.get_turn()) / 8); + } } else { Score m_score = board.get_num_made_moves(); @@ -1345,6 +1364,13 @@ void Thread::search() { template Move RootSearch(Board &board, Depth depth, Milliseconds duration = Milliseconds(24 * 60 * 60 * 1000)) { table::UpdateGeneration(); + if (armageddon) { + net_evaluation::SetContempt(60, kWhite); + } + else { + net_evaluation::SetContempt(contempt, board.get_turn()); + } + draw_score = net_evaluation::GetDrawArray(); min_ply = board.get_num_made_moves(); Threads.reset_node_count(); Threads.reset_depths(); @@ -2120,16 +2146,26 @@ std::vector GenerateEvalSampleSet(std::string filename) { return boards; } +void SetContempt(int contempt_) { + contempt = contempt_; +} + +void SetArmageddon(bool armageddon_) { + armageddon = armageddon_; +} + +#ifdef TUNE void SetFutilityMargin(Score score) { - //kFutileMargin = init_futility_margins(score); + kFutileMargin = init_futility_margins(score); } void SetSNMPMargin(Score score) { - //kSNMPMargin = score; + kSNMPMargin = score; } void SetLMRDiv(double div) { -// lmr_reductions = init_lmr_reductions(div); + lmr_reductions = init_lmr_reductions(div); } +#endif } diff --git a/src/search.h b/src/search.h index 703603a..02fabc3 100644 --- a/src/search.h +++ b/src/search.h @@ -65,9 +65,14 @@ void LoadSearchVariablesHardCoded(); void EvaluateCaptureMoveValue(int n); void EvaluateScoreDistributions(const int focus); +void SetContempt(int contempt); +void SetArmageddon(bool armageddon); + +#ifdef TUNE void SetFutilityMargin(Score score); void SetSNMPMargin(Score score); void SetLMRDiv(double div); +#endif } diff --git a/src/uci.cc b/src/uci.cc index fa12d00..caa578c 100644 --- a/src/uci.cc +++ b/src/uci.cc @@ -65,10 +65,16 @@ const std::string kEngineAuthorPrefix = "id author "; const std::string kOk = "uciok"; const std::string kUCIHashOptionString = "option name Hash type spin default 32 min 1 max 104576" - "\noption name Threads type spin default 1 min 1 max 256"; -// "\noption name Futility type spin default 1274 min 400 max 1500" -// "\noption name SNMPMargin type spin default 588 min 0 max 2000" -// "\noption name LMRDivisor type spin default 134 min 60 max 250"; + "\noption name Threads type spin default 1 min 1 max 256" + "\noption name Contempt type spin default 0 min -100 max 100" +#ifndef TUNE + "\noption name Armageddon type check default false"; +#else + "\noption name Armageddon type check default false" + "\noption name Futility type spin default 1274 min 400 max 1500" + "\noption name SNMPMargin type spin default 588 min 0 max 2000" + "\noption name LMRDivisor type spin default 134 min 60 max 250"; +#endif struct Timer { Timer() { @@ -118,11 +124,6 @@ void Go(Board *board, Timer timer) { std::cout << "bestmove " << parse::MoveToString(move) << std::endl; } - -} - -namespace uci { - bool Equals(std::string string_a, std::string string_b) { return string_a.compare(string_b) == 0; } @@ -131,6 +132,15 @@ void Reply(std::string message) { std::cout << message << std::endl; } +bool IsTrue(std::string s) { + return Equals(s, "true") || Equals(s, "True") || Equals(s, "TRUE") || Equals(s, "1"); +} + + +} + +namespace uci { + void Loop() { debug::EnterFunction(debug::kUci, "uci::Loop", ""); Board board; @@ -174,10 +184,10 @@ void Loop() { else if (Equals(command, "print_bitboards")) { board.PrintBitBoards(); } - else if (Equals(command, "print_features")) { +// else if (Equals(command, "print_features")) { // TODO replace with function from net_eval // evaluation::PrintFeatureValues(board); - } +// } else if (Equals(command, "isdraw")) { std::cout << board.IsDraw() << std::endl; } @@ -210,8 +220,20 @@ void Loop() { int num_threads = atoi(tokens[index++].c_str()); search::Threads.set_num_threads(num_threads); } + if (Equals(command, "Contempt")) { + index++; + int contempt = atoi(tokens[index++].c_str()); + search::SetContempt(contempt); + } + if (Equals(command, "Armageddon")) { + index++; + bool armageddon_setting = IsTrue(tokens[index++]); + search::SetArmageddon(armageddon_setting); + } +#ifdef TUNE if (Equals(command, "Futility")) { index++; + int futility = atoi(tokens[index++].c_str()); search::SetFutilityMargin(futility); } @@ -225,6 +247,7 @@ void Loop() { int div = atoi(tokens[index++].c_str()); search::SetLMRDiv(div * 0.01); } +#endif } else if (Equals(command, "print_moves")) { std::vector moves = board.GetMoves();