-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uncomment resign code to allow lczero to resign #418
Conversation
whitespace is better now hopefully?
I don't know what rootstate was, probably should have been root
I don't think UCI engines resign, it's up to the GUI to do that. |
This is like how leela go does it, it's for self play only... |
position has game ply...
LZGo uses GTP, which specifies how to resign. UCI does not. How is the client going to know who won? You will need a custom UCI command. I think it's better to have the client parse the UCI output and look at the cp score. This way we don't need custom UCI commands and lczero code doesn't have to change. |
Oh now I remember client doesn't speak UCI during the game, it just says "train" and the entire game runs. So I guess it has to be done similar to how you are doing it. |
I think you did some resign analysis? Can you post it here? |
keeps it from doing a1+ as a move_none
5% winrate is a better initial resign target
src/UCTSearch.cpp
Outdated
@@ -196,18 +196,16 @@ Move UCTSearch::get_best_move() { | |||
return bestmove; | |||
} | |||
|
|||
// should we consider resigning? | |||
/* | |||
// should we consider resigning? | |||
float bestscore = m_root->get_first_child()->get_eval(color); | |||
int visits = m_root->get_visits(); | |||
// bad score and visited enough | |||
if (bestscore < ((float)cfg_resignpct / 100.0f) | |||
&& visits > 500 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This visits > 500, seems rather specific to the fact that training is run at 800 visits.
Also this whole logic should be protected in some way to ensure its only run during training?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That 500 visits was there. I didn't put that in. It makes sense to have a threshold to keep it from resigning when it can't get enough playouts to determine if its a good move though. Default resignpct should probably be 0, then it won't resign, unless overridden...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely think there needs to be a minimum visit threshold, I just wonder if it should be calculated rather than a constant. Its much easier to get 500 visits to a 'best' move which is actually almost tied three ways, if the actual number of visits is 1600, rather than 800. This is relevant if the other moves were discovered late and have good win rates but haven't quite caught up to the leader by the 1600 visits. Temp is actually more likely to choose a move other than this one, since its less than half the visits.
Maybe > half the visits on the one option, and also > 500 to have confidence that the eval is reasonably calculated. (Or maybe all options with > 500 visits must agree to resign? Or the weighted majority of options with > 500 visits? I don't know what is best.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition was removed upstream. It dates back to the time when the score was estimated from Monte Carlo playouts, and doesn't make a lot of sense with a strong neural network evaluator.
Note: playMatch does not understand the NONE_MOVE so it will have a Error decoding: |
With T=1 this is a major time win. |
Please see the current LZGo codebase, the 500 magic number is gone, along with many other changes. |
I saw in the chat shyeel talking about doing some code for statistics. First, see for reference https://github.com/gcp/leela-zero/blob/master/scripts/resign_analysis/resign_analysis.py The main thing to measure is "incorrect resigns" and "moves saved by resigning". You need to analyze self-play games that have resign disabled. Calculate who would have resigned, and count how often it was the wrong side (incorrect resign). Calculate how many moves were saved. This is our cost/benefit analysis. |
The LZGo codebase appears not to use temperature after x moves - so they can do proper resign analysis easily. Also resignation is therefore mostly about improving games per hour, and possibly focusing the engine to learn less about deep lost endgames that no one is actually going to play out in practice. Its not about providing a temperature reducing effect. |
Yes, Leela Zero uses the temperature parameters directly from the Alphago Zero paper, which uses t=1 for the first 30 moves and t->0 for the rest of the game. In fact, fractional temperature is not implemented at all in Leela Zero. They don't really need it since the move space is so much bigger, and symmetry application provides another source of randomness. I don't really like the sharp cutoff where moves up to no. 30 can include any blunder, while those from move 31 don't, but arguably it works. |
cfg_resignpct should default to something that means never resign. I guess a magic I think we can just remove the visits qualifier. |
won't resign by default now
no visit limit now since we do 800 playouts anyway and this is not for uci
bh.do_move(move); | ||
} else { | ||
return bh.cur().side_to_move() == WHITE ? -1 : 1; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@glinscott does this part look ok to you? Making MOVE_NONE mean resign?
@glinscott I put one question in the code diff, can you take a look? If that is ok I think we should pull this and then the clients will be ready for when the server starts sending the |
removed comments