Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible wrong terminal state evaluation at PvNodes. #5589

Open
peregrineshahin opened this issue Sep 12, 2024 · 7 comments
Open

Possible wrong terminal state evaluation at PvNodes. #5589

peregrineshahin opened this issue Sep 12, 2024 · 7 comments

Comments

@peregrineshahin
Copy link
Contributor

Describe the issue

Non-draw scores can propogate to the root from PV nodes ending with stalemates in qsearch causing wrong evals for such terminal state and a possibly a mismatch between that PV reported terminated and the eval.

Expected behavior

Correct terminal state at least in PV nodes.

Steps to reproduce

diff --git a/src/search.cpp b/src/search.cpp
index ac0b9c6d..e05476c7 100644
--- a/src/search.cpp
+++ b/src/search.cpp
@@ -1641,10 +1641,28 @@ Value Search::Worker::qsearch(Position& pos, Stack* ss, Value alpha, Value beta)
     // Step 9. Check for mate
     // All legal moves have been searched. A special case: if we are
     // in check and no legal moves were found, it is checkmate.
-    if (ss->inCheck && bestValue == -VALUE_INFINITE)
+    if (!moveCount)
     {
-        assert(!MoveList<LEGAL>(pos).size());
-        return mated_in(ss->ply);  // Plies to mate from the root
+        if (ss->inCheck)
+        {
+            assert(!MoveList<LEGAL>(pos).size());
+            return mated_in(ss->ply);  // Plies to mate from the root
+        }
+
+        if (PvNode)
+        {
+            dbg_hit_on(
+              !MoveList<LEGAL>(pos).size());  // Hit #0: Total 923 Hits 1 Hit Rate (%) 0.108342
+
+            if (!MoveList<LEGAL>(pos).size())
+            {
+                // Mean #0: Total 1 Mean -626
+                // Bench is not changed because of this 1 hit
+                // (since the hits is 1, this is a hack to know what the value was (-626))
+                dbg_mean_of(bestValue);
+            }
+        }
     }

Anything else?

see also https://www.chess.com/computer-chess-championship#event=ccc23-rapid-finals&game=81

Operating system

All

Stockfish version

all

@vondele
Copy link
Member

vondele commented Sep 12, 2024

while possible, do we have some evidence that is actually a result from this?

@vondele
Copy link
Member

vondele commented Sep 12, 2024

For

position fen 2rr1b2/5p1k/PqP3p1/1N2BpPp/1P2bP1P/2Q5/3R4/R4K2 w - - 1 39
# https://www.chess.com/computer-chess-championship#event=ccc23-rapid-finals&game=81
setoption name Threads value 30
setoption name Hash value 15000
go infinite

the trap is seen at d29...

info depth 27 seldepth 58 multipv 1 score cp 271 nodes 77109893 nps 20756364 hashfull 22 tbhits 0 time 3715 pv c3c4 f8g7 e5g7 b6e3 g7e5 e3h3 f1e1 h3h4 d2f2 e4d5 c4f1 d8e8 a1c1 e8e5 f4e5 h4b4 b5c3 c8c6 f1d3 d5e6 a6a7 b4a3 d3e3 c6c7 f2c2 c7d7
info depth 28 seldepth 61 multipv 1 score cp 271 nodes 92222059 nps 20926267 hashfull 25 tbhits 0 time 4407 pv c3c4 f8g7 e5g7 b6e3 g7e5 e3h3 f1e1 h3h4 d2f2 e4d5 c4f1 d8e8 a1c1 e8e5 f4e5 h4b4 b5c3 c8c6 f1d3 b4a3 d3e3 d5e6 a6a7 c6c7 f2c2 c7a7 e1f2 a3e7 c1d1
info depth 29 seldepth 62 multipv 1 score cp 47 nodes 134515774 nps 17625232 hashfull 36 tbhits 0 time 7632 pv c3c4 f8g7 c4f7 b6b5 f1e1 d8g8 f7b7 e4c6 b7b5 c6b5 d2d5 b5c6 d5c5 g7e5 c5e5 c8b8 a1c1 c6e4 b4b5 e4d3 c1c5 b8a8 e5e6 g8b8 c5c7 h7h8 a6a7 b8e8 e1d2 d3b5 e6b6 b5e2 b6g6 a8d8 d2c3 e2f3 g6g7 e8e3 c3b4 d8d4 b4c5 d4d5 c5c4 d5a5 g7d7
info depth 30 seldepth 69 multipv 1 score cp 16 nodes 210014839 nps 14846234 hashfull 50 tbhits 0 time 14146 pv c3c4 f8g7 c4f7 b6b5 f1e1 d8g8 f7b7 e4c6 b7b5 c6b5 d2d5 b5c6 d5d6 g7f8 d6d4 f8g7 e1f2 g8e8 a6a7 e8e5 f4e5 g7e5 a1d1 e5d4 d1d4 c8c7 b4b5 c6a8 b5b6 c7g7

@peregrineshahin
Copy link
Contributor Author

I'm not sure that game is related, I can imagine how the blunder can come in many ways indeed. so requiring more depth can solve it.
but yeah it is probably unrelated because no near stalemate position in the PVs of that game.
but.. it is a concern to me that the reported PVs can end in a stalemate that is "evaluated" and evaluated highly as per the debugging info.

@peregrineshahin
Copy link
Contributor Author

worth noting that I tried many guards as gainer bounds, but never tried the PVNode guard which is probably the most important guard becasue across versions I didn't get it to change bench

@vondele
Copy link
Member

vondele commented Sep 12, 2024

yeah, running a test with PVNode guard where it computes the statemate or 3-fold is worth trying.

@peregrineshahin
Copy link
Contributor Author

peregrineshahin commented Sep 12, 2024

well at least good news that the PvNode check is functional at higher depths now I realise
https://tests.stockfishchess.org/tests/view/66e32ca386d5ee47d953a696
Patch: Nodes searched : 20410465 vs master: Nodes searched : 20448992

diff --git a/src/search.cpp b/src/search.cpp
index ac0b9c6d..1ab24c1b 100644
--- a/src/search.cpp
+++ b/src/search.cpp
@@ -1641,10 +1641,16 @@ Value Search::Worker::qsearch(Position& pos, Stack* ss, Value alpha, Value beta)
     // Step 9. Check for mate
     // All legal moves have been searched. A special case: if we are
     // in check and no legal moves were found, it is checkmate.
-    if (ss->inCheck && bestValue == -VALUE_INFINITE)
+    if (!moveCount)
     {
-        assert(!MoveList<LEGAL>(pos).size());
-        return mated_in(ss->ply);  // Plies to mate from the root
+        if (ss->inCheck)
+        {
+            assert(!MoveList<LEGAL>(pos).size());
+            return mated_in(ss->ply);  // Plies to mate from the root
+        }
+
+        if (PvNode && !MoveList<LEGAL>(pos).size())
+            return VALUE_DRAW;
     }

     if (std::abs(bestValue) < VALUE_TB_WIN_IN_MAX_PLY && bestValue >= beta)

@peregrineshahin peregrineshahin changed the title Catastrophic terminal state evaluation. Possible wrong terminal state evaluation at PvNodes. Sep 13, 2024
@peregrineshahin
Copy link
Contributor Author

sounds not easily fixable ATM and fishtest suggesting it's not as dangerous as I thought, so generating all legal moves is likely expensive for the tradeoff. probably worth not wasting time on it, and visit later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants