-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Contempt 24 #1806
Contempt 24 #1806
Conversation
Master 2018-11-01 vs Stockfish 9 Contempt 21 Contempt 24 |
Your comment refers to PawnValueMg, but the dependency is with PawnValueEg isn't it ? |
@Rocky640 pawnvalueg was also decreased, although by not that much. But yes, I was wrong but point still stands :) |
Generally I am all in for contempt increases, but I have many reasons to be neutral about this:
Having said all that, I have to say that I am NOT against this change. I regard however that the next Leela networks will be really strong and its a good idea to prepare already. Hence what I considered sometime ago a waste of resources, I think now its a valuable knowledge: to approximately locate the self-play optimal contempt for future use which was half a year ago between 10 and 11. Of course for these tests there is no hurry at all and a very idle network should be in place. Prio -1 is a nice alibi for not wasting resources, but in reality people are interested in those tests and are not as motivated to write patches as with a pure empty Q. |
I am in favor of this change. |
My understanding on how contempt in SF brings result is I see any contempt different from zero as resulting in, strictly speaking, suboptimal play. But because of the drawish nature of chess, and also because by playing to keep the game more complex, if not optimal, we create more chances for any given opponent to make a mistake, which SF is good in punishing, because of its tactical ability (and which is proportional to the complexity of the position). |
Sure @DragonMist this can happen. Contempt 27 didn't pass [-3;1], for example, and c=50 failed [-4;0] pretty fast. |
IMO problem is with dynamic contempt, and i believe, patch that will remove it, will easy pass -3 1 |
@Vizvezdenec @DragonMist Wants some rough estimation of how different scores behave vs ct0. Its not some info to help SF change something, its just interesting info. Sprt's are useless for this, we can do cheap runs of fixed 10K-20K STC games with huge margins to see the elo fluctuation range, is it 2, 4 or 6 elo? Locating the maximum self-play elo contempt is a valuable info for equal-rated opponents. I remind that half a year ago ct=10 passed clear green (0,4) LTC vs ct=12, but we prefered to keep 12 I think its very useful to have all the previous info and discussions together: @MichaelB7 I kinda agree, those stuff are vain for progress but at the same time interesting. Its something to do with totally idle network. But the contempt is something which actually gives a 3rd dimension to the AB search, it has a potential to be used for giving more search depth to certain positions and less to others. Currently this focus is used on the very sensitive (ct, ct/2) gradient (half contempt for endgame). What happens here is that the search is more reluctant to enter the endgame territory, as it loses the contempt bonus of the eval. So we actually safeguard our middlegame plan, we protect against simplifying to a high-eval drawish endgame, when we can have other middlegame options. That is exactly where the self-elo contempt gains come compared to ct=0. One night, I could not sleep at all, had crazy inspirational brainstorming for hours, producing these ideas: For the next month I will be exploring Iran, probably abstaining completely from internet. I will be awaiting pleasant updates when I return! |
I know my opinion is probable in the minority - but we spend way too much time on contempt. It s the vanity of vanities, Here today , gone tomorrow. Just about every single scoring change will impact the optimal score for contempt. It reminds me of the Book of Ecclesiastes. |
Thank you @NKONSTANTAKIS for understanding, supporting and providing so much info in one place. |
dynamic contempt removal wouldn't pass [-3;1], it was run on 1 mil games with -2.5 elo performance with error bar < 1 elo. |
You can believe what you want, but 10 passed clear green vs 12 as well not just vs 8. Those gains were on top of other gains which were on top of other gains which were gaining +2 elo vs C0. So yea if you have chained positive tests on top of each other, confidence and flukes can't hold the castle, sorry. For me its fairly certain that a contempt value around 10-11 beats both ct=0 and ct=24 head to head by 2-4 elo. Vs c0 I am sure, and since ct24 barely passed (-3,1) vs ct0 I don't see why not as well. Anyone interested in this could check it, its just 1-2 tests. I think its better to be clear on how much elo we are sacrificing in self play for better results vs weaker engines (and the other goodies like decreased drawrate, more spectacular play etc) than to operate in the shadow for not attracting contempt opposition. Then we can select the contempt we like, but by knowing exactly what is going on. @mcostalba Has expressed a different viewpoint than @snicolet on this in the past, valuing self-play elo more. I think both worlds have their virtues. @Vizvezdenec For me its not too important to use resources for this because I am sure. But since you question me it automatically becomes important for me because its like you throw a glove at me, and I accept the challenge. @DragonMist Requested it too. We don't need 20+ tests like you did vs SF7 etc nor 5 that DM asked, just 1-2. |
The value of contempt in self play (if any) will never be measured accurately. A fixed length test with a resolution of 1 elo needs roughly 170000 games. If you do multiple tests you need even more games (as the uncertainties accumulate). People are unwilling to invest the proper amount of ressources to obtain scientifically valid conclusions. I understand this is not the aim of the SF project, but one should be open about it and not be trying to keep up the pretence. Note that you have to allocate high enough ressources before a test since if the ressources are too low and you get a non-significant result there is no conclusion - neither negative nor positive - the ressources are simply wasted. PS. I wrote "if any" above since I could not duplicate the supposed elo gain in private testing. It may of course be that there is something wrong with my personal setup. |
I see it quite simply really. SF default contempt should be:
You can rave on about statistics all you want (lies, damn lies, and statistics) and so can I (and do!). However, I'd suggest that ultimately, it's all about doing more "good" than "harm" (applies to perhaps everything in life?), and therefore, the perspective of 1. above seems reasonable to me. |
This already is a flaw. A SPRT(-3,1) that fails does not mean a regression. The probability of a false negative is about 36% which is much too high to allow for any conclusion. |
@vdbergh The proposition was not for accuracy definitely not for 170K games as you say, this would be a waste. Why would we need accuracy for something that we are not going to use? The reason for this is just to prove that a self-play elo gain exists for medium contempt. In this way people can use this for best possible play or analysis instead of using default contempt or 0 contempt. 40K for +-2 elo is more than enough, and it at -1 prio it will run on idle network. In fact even at 20K games I estimate that the elo gain will be bigger than the margins. I also want to make clear than I don't oppose the rise of default contempt and that I find this made up rule of picking the highest one which passes (-3,1) vs 0 really good for picking a balance point. It is not important if the (-3,1) flukes in one direction or the other, its just a way for us to not overdo it and at the same time keep people calm about self-play performance not diverging too much from optimal. |
Merged via 2a7213f, thanks! |
I haven't seen any news on this so decided to make a PR...
passed non-regression STC vs c=0
http://tests.stockfishchess.org/tests/view/5bd6d7f80ebc595e0ae21e14
passed non-regression LTC vs c=0
http://tests.stockfishchess.org/tests/view/5bd6e0980ebc595e0ae21f07
Usually it's enough to set it to new (higher) value. Also because we resently decreased PawnValueMg it's logical that higher contempt values don't regress now because they are dependant on this value.