-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use independent game trees for white and black during training #528
Conversation
It was spinning on resign, if the 450th ply was a win/loss/draw it wasn't recognized, and if the 450th ply was a win/loss/draw it was trying to play a move from that position.
This is a compromise between disabling tree reuse and insufficient application of direlicht noise. Memory usage will increase, but training nodes level is pretty small, so its not much.
Some very rough performance testing suggests old code gets less than 10% more moves per hour than new code. (Obviously game duration varies significantly based on ply, so in the interest of time I just did a time per move estimation based on 3 games with the change, and 3 games at head - each starting with a fresh instance of the client to ensure NNcache didn't follow from one game to the next.) |
Dirichlet noise is only applied at the root, so why should tree re-use affect anything other than performance? |
As @Akababa noted, this would not do any good due to noise only being applied at the root. Only the first effect would actually apply, but giving up ~10% performance in self-play does not seem like a good idea to me at this point, especially if Dirichlet noise effect could also be fine-tuned with the mixing constant. The code looks very useful however if we decided at a later point to try some adversarial learning techniques and actually use different networks for both sides of self-play games. |
If we want to do something to increase the effectiveness of Dirichlet noise, I would prefer to turn off tree-reuse entirely. Most of the performance should be recovered due to NNCache hits. |
The following is why tree reuse affects Dirichlet noise, since apparently it isn't obvious. Noise is applied to root, 800 visits are played. To explain my second point about how having independent trees makes more of a difference than just the reduced tree reuse. Currently if noise causes a discovery, the next player starts from a tree reuse which justified that discovery. By having an independent tree, they probably (or are at least more likely to) have a tree reuse with almost no visits down the discovery path - so they get almost the full effect of noise after the opponent has made a noise induced discovery. (This being in addition to normal possibilities of lower tree reuse due to temperature.) I think its likely to be well worth the <10% loss in game rate - to reduce the amount that the reinforcement learning can overfit the policy. I can experiment with disabling tree reuse entirely to see what the performance loss is there. But I think one of these kinds of options will be useful. |
So, in what is probably an example of the size of my error bars. Disabling tree reuse entirely comes up as ~10% more moves per hour than the original single tree. (So it certainly seems plausible that NN cache dominates the performance characteristics.) |
#536 is a better candidate. |
Using two trees improves the power of Dirichlet noise in two ways.
It has some downsides obviously.
This concept came up during a discussion about ELF yesterday - I've not actually worked out whether ELF actually does this, but I suspect it does...