Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use independent game trees for white and black during training #528

Closed
wants to merge 5 commits into from

Conversation

Tilps
Copy link
Contributor

@Tilps Tilps commented May 4, 2018

Using two trees improves the power of Dirichlet noise in two ways.

  1. By having an intervening move, the amount of tree reuse is often smaller.
  2. The noise for one player is completely independent of the other - so if it results in a discovery, the other player won't have a tree already planned based on the same noise. This should encourage more noise effect to potentially find a counter discovery.

It has some downsides obviously.

  1. Memory usage increases. (I don't think this is significant since we use pretty low nodes for training.)
  2. A reduction in performance. I have not quantified it, but I do not expect it to be significant. In part because there still is some tree reuse happening, and secondly because the NNcache is shared between both trees.

This concept came up during a discussion about ELF yesterday - I've not actually worked out whether ELF actually does this, but I suspect it does...

It was spinning on resign, if the 450th ply was a win/loss/draw it
wasn't recognized, and if the 450th ply was a win/loss/draw it was
trying to play a move from that position.
This is a compromise between disabling tree reuse and insufficient
application of direlicht noise.  Memory usage will increase, but
training nodes level is pretty small, so its not much.
@Tilps
Copy link
Contributor Author

Tilps commented May 4, 2018

Some very rough performance testing suggests old code gets less than 10% more moves per hour than new code. (Obviously game duration varies significantly based on ply, so in the interest of time I just did a time per move estimation based on 3 games with the change, and 3 games at head - each starting with a fresh instance of the client to ensure NNcache didn't follow from one game to the next.)

@Akababa
Copy link
Contributor

Akababa commented May 4, 2018

Dirichlet noise is only applied at the root, so why should tree re-use affect anything other than performance?

@jkiliani
Copy link
Contributor

jkiliani commented May 4, 2018

As @Akababa noted, this would not do any good due to noise only being applied at the root. Only the first effect would actually apply, but giving up ~10% performance in self-play does not seem like a good idea to me at this point, especially if Dirichlet noise effect could also be fine-tuned with the mixing constant.

The code looks very useful however if we decided at a later point to try some adversarial learning techniques and actually use different networks for both sides of self-play games.

@killerducky
Copy link
Collaborator

If we want to do something to increase the effectiveness of Dirichlet noise, I would prefer to turn off tree-reuse entirely. Most of the performance should be recovered due to NNCache hits.

@Tilps
Copy link
Contributor Author

Tilps commented May 4, 2018

The following is why tree reuse affects Dirichlet noise, since apparently it isn't obvious.

Noise is applied to root, 800 visits are played.
Move is decided.
New root calculated.
Noise is applied to the new root. But now only '800 - tree reuse' visits are played.
Therefore for the new root noise to have any effect it has to overcome all the tree reuse visits which were played out with non-noised (and additionally fpu reduced!) uct_select child. This is a significant hurdle to noise induced exploration happening.
Tuning the amount of noise to try and overcome this deficit is non-obvious as the hurdle size varies every turn.

To explain my second point about how having independent trees makes more of a difference than just the reduced tree reuse. Currently if noise causes a discovery, the next player starts from a tree reuse which justified that discovery. By having an independent tree, they probably (or are at least more likely to) have a tree reuse with almost no visits down the discovery path - so they get almost the full effect of noise after the opponent has made a noise induced discovery. (This being in addition to normal possibilities of lower tree reuse due to temperature.)

I think its likely to be well worth the <10% loss in game rate - to reduce the amount that the reinforcement learning can overfit the policy.

I can experiment with disabling tree reuse entirely to see what the performance loss is there. But I think one of these kinds of options will be useful.

@Tilps
Copy link
Contributor Author

Tilps commented May 5, 2018

So, in what is probably an example of the size of my error bars. Disabling tree reuse entirely comes up as ~10% more moves per hour than the original single tree. (So it certainly seems plausible that NN cache dominates the performance characteristics.)

@killerducky
Copy link
Collaborator

#536 is a better candidate.

@killerducky killerducky closed this May 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants