Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uncomment resign code to allow lczero to resign #418

Merged
merged 10 commits into from
May 2, 2018

Conversation

jjoshua2
Copy link
Contributor

removed comments

whitespace is better now hopefully?
I don't know what rootstate was, probably should have been root
@killerducky
Copy link
Collaborator

I don't think UCI engines resign, it's up to the GUI to do that.

@jjoshua2
Copy link
Contributor Author

This is like how leela go does it, it's for self play only...

position has game ply...
@killerducky
Copy link
Collaborator

LZGo uses GTP, which specifies how to resign. UCI does not.

How is the client going to know who won? You will need a custom UCI command. I think it's better to have the client parse the UCI output and look at the cp score. This way we don't need custom UCI commands and lczero code doesn't have to change.

@killerducky
Copy link
Collaborator

Oh now I remember client doesn't speak UCI during the game, it just says "train" and the entire game runs. So I guess it has to be done similar to how you are doing it.

@killerducky
Copy link
Collaborator

I think you did some resign analysis? Can you post it here?

keeps it from doing a1+ as a move_none
5% winrate is a better initial resign target
@killerducky killerducky changed the base branch from master to next April 23, 2018 01:38
@@ -196,18 +196,16 @@ Move UCTSearch::get_best_move() {
return bestmove;
}

// should we consider resigning?
/*
// should we consider resigning?
float bestscore = m_root->get_first_child()->get_eval(color);
int visits = m_root->get_visits();
// bad score and visited enough
if (bestscore < ((float)cfg_resignpct / 100.0f)
&& visits > 500
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This visits > 500, seems rather specific to the fact that training is run at 800 visits.
Also this whole logic should be protected in some way to ensure its only run during training?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That 500 visits was there. I didn't put that in. It makes sense to have a threshold to keep it from resigning when it can't get enough playouts to determine if its a good move though. Default resignpct should probably be 0, then it won't resign, unless overridden...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely think there needs to be a minimum visit threshold, I just wonder if it should be calculated rather than a constant. Its much easier to get 500 visits to a 'best' move which is actually almost tied three ways, if the actual number of visits is 1600, rather than 800. This is relevant if the other moves were discovered late and have good win rates but haven't quite caught up to the leader by the 1600 visits. Temp is actually more likely to choose a move other than this one, since its less than half the visits.
Maybe > half the visits on the one option, and also > 500 to have confidence that the eval is reasonably calculated. (Or maybe all options with > 500 visits must agree to resign? Or the weighted majority of options with > 500 visits? I don't know what is best.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition was removed upstream. It dates back to the time when the score was estimated from Monte Carlo playouts, and doesn't make a lot of sense with a strong neural network evaluator.

@jjoshua2
Copy link
Contributor Author

Note: playMatch does not understand the NONE_MOVE so it will have a Error decoding:
easy way to fix that is just not use resigns in matches like we currently do. they dont have the temperature problem anyway as much

@jjoshua2
Copy link
Contributor Author

jjoshua2 commented Apr 23, 2018

With T=1 this is a major time win.
Completed 14 games in 3h3m cpu time vs 17 games in 1h51m with 5% resign.
17.4 min/game vs 6.5 min/game. So we could maybe double our playouts here...

@killerducky
Copy link
Collaborator

Please see the current LZGo codebase, the 500 magic number is gone, along with many other changes.

@killerducky
Copy link
Collaborator

I saw in the chat shyeel talking about doing some code for statistics. First, see for reference https://github.com/gcp/leela-zero/blob/master/scripts/resign_analysis/resign_analysis.py

The main thing to measure is "incorrect resigns" and "moves saved by resigning". You need to analyze self-play games that have resign disabled. Calculate who would have resigned, and count how often it was the wrong side (incorrect resign). Calculate how many moves were saved. This is our cost/benefit analysis.

@Tilps
Copy link
Contributor

Tilps commented Apr 23, 2018

The LZGo codebase appears not to use temperature after x moves - so they can do proper resign analysis easily. Also resignation is therefore mostly about improving games per hour, and possibly focusing the engine to learn less about deep lost endgames that no one is actually going to play out in practice. Its not about providing a temperature reducing effect.

@jkiliani
Copy link
Contributor

Yes, Leela Zero uses the temperature parameters directly from the Alphago Zero paper, which uses t=1 for the first 30 moves and t->0 for the rest of the game. In fact, fractional temperature is not implemented at all in Leela Zero. They don't really need it since the move space is so much bigger, and symmetry application provides another source of randomness. I don't really like the sharp cutoff where moves up to no. 30 can include any blunder, while those from move 31 don't, but arguably it works.

@killerducky
Copy link
Collaborator

float bestscore = m_root->get_first_child()->get_eval(color);
I think we should use m_root get_eval, because that's what cp eval was tuned for, and that's what the resign analysis I did was based on. It's an arbitrary choice because even plies will be biased one way and odd plies will be biased the other way. Best just stick to the one we've got.

cfg_resignpct should default to something that means never resign. I guess a magic -1, which actually works with the current code as is. client can override this.

I think we can just remove the visits qualifier.

won't resign by default now
@killerducky killerducky changed the base branch from next to master May 1, 2018 15:31
@killerducky killerducky changed the base branch from master to next May 1, 2018 15:31
jjoshua2 and others added 3 commits May 1, 2018 11:38
no visit limit now since we do 800 playouts anyway and this is not for uci
bh.do_move(move);
} else {
return bh.cur().side_to_move() == WHITE ? -1 : 1;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@glinscott does this part look ok to you? Making MOVE_NONE mean resign?

@killerducky
Copy link
Collaborator

@glinscott I put one question in the code diff, can you take a look? If that is ok I think we should pull this and then the clients will be ready for when the server starts sending the --resign N switch. By default they will act the same as before so this should be safe.

@killerducky killerducky merged commit 42c623c into glinscott:next May 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants