Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update resign_analysis script for chess. #427

Merged
merged 3 commits into from
Apr 27, 2018

Conversation

killerducky
Copy link
Collaborator

Update script to parse output of client -debug. Also slightly modify the info string winrate to make it easier to parse.

I'll collect some data overnight and analyze tomorrow. Should have ~500 games by then.

See also #418, this is just the code to analyze what resign rate is reasonable.

@killerducky
Copy link
Collaborator Author

Note: This code assumes all log files are for self-play games, not matches. I think that's true because client/main.go only adds -l for train, not for playMatch. Maybe later we can add a way to detect what type of game it is without relying on this behavior.

@Tilps
Copy link
Contributor

Tilps commented Apr 24, 2018

I think that this approach will get a rather confused look at resignation false positives. Since temperature still applies after the resignation detection depth, false positive rates found by this approach are probably higher than what would be obtained from temperature = 0 play after the resignation point.
(But maybe thats okay, it just might result in a rather conservative resignation rate decision.)

resign_plynum = plynum
#print("debug stm, winrate, plynum", stm, winrate, plynum)
#print("debug who_resigned {} resign_playnum {} total_plynum {}".format(who_resigned, resign_plynum, plynum))
if ((score == -1 and who_resigned == "Black") or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be score != 1 and score != -1 on the next line?
Converting a resignation in to a draw is also an incorrect resignation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's only half as bad, so I think it should only have half as much affect on increasing FP rate.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I will change it to report more stats like resigned a drawn game and resigned a won game.

Add Very Incorrect resigns for winning a game you would have resigned.
@killerducky
Copy link
Collaborator Author

killerducky commented Apr 25, 2018

Results:

Analyzing 621 games
RR   - Resign Rate
Draw - number of drawn games
NR   - No Resign
I    - Incorect Resign (would have resigned, but drew or won)
VI   - Very Incorect Resign (would have resigned, but won)
C    - Correct Resign (would have resigned, and actually lost
PS   - Plies Saved
I, VI, C display two values. Including / excluding the NR games
RR  0% NR 100%
RR  1% Draw  7.9% NR 30.1% I  1.1%/ 1.6% VI  0.3%/ 0.5% C 68.8%/98.4% PS  8.0%
RR  2% Draw  7.9% NR 16.1% I  3.1%/ 3.6% VI  1.1%/ 1.3% C 80.8%/96.4% PS 22.3%
RR  3% Draw  7.9% NR 12.7% I  4.3%/ 5.0% VI  1.8%/ 2.0% C 82.9%/95.0% PS 31.5%
RR  4% Draw  7.9% NR  9.7% I  5.5%/ 6.1% VI  2.4%/ 2.7% C 84.9%/93.9% PS 37.0%
RR  5% Draw  7.9% NR  8.4% I  6.3%/ 6.9% VI  2.7%/ 3.0% C 85.3%/93.1% PS 40.9%
RR 10% Draw  7.9% NR  5.5% I 10.1%/10.7% VI  6.0%/ 6.3% C 84.4%/89.3% PS 50.7%
RR 20% Draw  7.9% NR  3.2% I 15.6%/16.1% VI  9.8%/10.1% C 81.2%/83.9% PS 61.6%
RR 30% Draw  7.9% NR  1.3% I 22.9%/23.2% VI 15.6%/15.8% C 75.8%/76.8% PS 70.9%
RR 40% Draw  7.9% NR  0.3% I 35.4%/35.5% VI 27.7%/27.8% C 64.3%/64.5% PS 80.4%
RR 50% Draw  7.9% NR  0.0% I 52.3%/52.3% VI 44.4%/44.4% C 47.7%/47.7% PS 98.4%

Biggest reversals:
https://lichess.org/v9GR0EpJ#225
https://lichess.org/4Crixe7L#80

Looks like around 3% is under Deepmind's 5% guideline, and gives a very nice 31.5% time savings + elimination of "garbage time" training positions that don't matter.

@mooskagh
Copy link
Contributor

mooskagh commented Apr 25, 2018 via email

@jkiliani
Copy link
Contributor

I would be interested in the answer to @mooskagh's question. By the way, I think we can allow more than 5% incorrect without experiencing any problems, due our use of temperature in end game which Alphago Zero didn't.

@killerducky
Copy link
Collaborator Author

Incorrect + Correct + Never_resigned = 100%. Last one is not printed. Very Incorrect is a subset of Incorrect.

I'll add this to the definitions printed since several people asked about it.

@mooskagh
Copy link
Contributor

mooskagh commented Apr 25, 2018 via email

@killerducky
Copy link
Collaborator Author

Why is incorrect + correct > 100% in the last line?

Thanks, I see a bug in correct_resigns, it is set in the elif of very_incorrect. It should be elif of the incorrect instead. Will fix tonight and rereun. I'll add some asserts to the code to catch these things.

Also output more stats.
@killerducky
Copy link
Collaborator Author

@mooskagh I fixed the bug, and added some more output stats.

NR + first I + first C = 100%
second I + second C = 100%

Probably should add that to the output description.

@killerducky
Copy link
Collaborator Author

Here is another set of games, my most recent 2369. No overlap in games from the previous set.

Overall the incorrect resign rate when slightly down.

Analyzing 2369 games
RR   - Resign Rate
Draw - number of drawn games
NR   - No Resign
I    - Incorect Resign (would have resigned, but drew or won)
VI   - Very Incorect Resign (would have resigned, but won)
C    - Correct Resign (would have resigned, and actually lost
PS   - Plies Saved
I, VI, C display two values. Including / excluding the NR games
RR  0% NR 100%
RR  1% Draw  7.6% NR 29.6% I  1.1%/ 1.6% VI  0.3%/ 0.5% C 69.3%/98.4% PS  8.3%
RR  2% Draw  7.6% NR 16.5% I  2.1%/ 2.5% VI  0.6%/ 0.7% C 81.4%/97.5% PS 22.1%
RR  3% Draw  7.6% NR 11.7% I  3.2%/ 3.6% VI  1.1%/ 1.2% C 85.1%/96.4% PS 32.4%
RR  4% Draw  7.6% NR 10.0% I  4.3%/ 4.8% VI  1.8%/ 2.0% C 85.6%/95.2% PS 38.1%
RR  5% Draw  7.6% NR  8.7% I  5.1%/ 5.6% VI  2.2%/ 2.4% C 86.2%/94.4% PS 41.6%
RR 10% Draw  7.6% NR  5.4% I  8.9%/ 9.4% VI  5.0%/ 5.3% C 85.7%/90.6% PS 51.8%
RR 20% Draw  7.6% NR  2.7% I 16.0%/16.5% VI 10.5%/10.8% C 81.3%/83.5% PS 62.8%
RR 30% Draw  7.6% NR  1.3% I 24.1%/24.4% VI 17.3%/17.5% C 74.6%/75.6% PS 71.2%
RR 40% Draw  7.6% NR  0.3% I 35.8%/35.9% VI 28.4%/28.4% C 63.9%/64.1% PS 81.3%
RR 50% Draw  7.6% NR  0.0% I 53.0%/53.0% VI 45.5%/45.5% C 47.0%/47.0% PS 98.5%

@killerducky killerducky merged commit 5dab9e4 into glinscott:next Apr 27, 2018
@killerducky
Copy link
Collaborator Author

Update with recent 15x196 nets. Looks like we're still good.

Analyzing 1045 games
RR   - Resign Rate
Draw - number of drawn games
NR   - No Resign
I    - Incorect Resign (would have resigned, but drew or won)
VI   - Very Incorect Resign (would have resigned, but won)
C    - Correct Resign (would have resigned, and actually lost
PS   - Plies Saved
I, VI, C display two values. Including / excluding the NR games
RR  0% NR 100%
RR  1% Draw  7.9% NR 27.3% I  0.6%/ 0.8% VI  0.2%/ 0.3% C 72.2%/99.2% PS  7.1%
RR  2% Draw  7.9% NR 17.3% I  1.3%/ 1.6% VI  0.6%/ 0.7% C 81.3%/98.4% PS 17.0%
RR  3% Draw  7.9% NR 13.2% I  1.7%/ 2.0% VI  0.6%/ 0.7% C 85.1%/98.0% PS 24.9%
RR  4% Draw  7.9% NR 11.5% I  2.6%/ 2.9% VI  1.1%/ 1.3% C 85.9%/97.1% PS 30.1%
RR  5% Draw  7.9% NR 10.9% I  2.8%/ 3.1% VI  1.3%/ 1.5% C 86.3%/96.9% PS 34.0%
RR 10% Draw  7.9% NR  7.5% I  6.5%/ 7.0% VI  3.6%/ 3.9% C 86.0%/93.0% PS 43.8%
RR 20% Draw  7.9% NR  4.7% I 12.9%/13.6% VI  8.5%/ 8.9% C 82.4%/86.4% PS 56.3%
RR 30% Draw  7.9% NR  2.4% I 21.7%/22.3% VI 15.6%/16.0% C 75.9%/77.7% PS 66.0%
RR 40% Draw  7.9% NR  0.4% I 33.8%/33.9% VI 26.0%/26.1% C 65.8%/66.1% PS 78.0%
RR 50% Draw  7.9% NR  0.0% I 55.8%/55.8% VI 47.8%/47.8% C 44.2%/44.2% PS 98.3%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants