Use independent game trees for white and black during training #528

Tilps · 2018-05-04T11:58:08Z

Using two trees improves the power of Dirichlet noise in two ways.

By having an intervening move, the amount of tree reuse is often smaller.
The noise for one player is completely independent of the other - so if it results in a discovery, the other player won't have a tree already planned based on the same noise. This should encourage more noise effect to potentially find a counter discovery.

It has some downsides obviously.

Memory usage increases. (I don't think this is significant since we use pretty low nodes for training.)
A reduction in performance. I have not quantified it, but I do not expect it to be significant. In part because there still is some tree reuse happening, and secondly because the NNcache is shared between both trees.

This concept came up during a discussion about ELF yesterday - I've not actually worked out whether ELF actually does this, but I suspect it does...

It was spinning on resign, if the 450th ply was a win/loss/draw it wasn't recognized, and if the 450th ply was a win/loss/draw it was trying to play a move from that position.

This is a compromise between disabling tree reuse and insufficient application of direlicht noise. Memory usage will increase, but training nodes level is pretty small, so its not much.

Tilps · 2018-05-04T13:13:55Z

Some very rough performance testing suggests old code gets less than 10% more moves per hour than new code. (Obviously game duration varies significantly based on ply, so in the interest of time I just did a time per move estimation based on 3 games with the change, and 3 games at head - each starting with a fresh instance of the client to ensure NNcache didn't follow from one game to the next.)

Akababa · 2018-05-04T14:27:12Z

Dirichlet noise is only applied at the root, so why should tree re-use affect anything other than performance?

jkiliani · 2018-05-04T19:23:12Z

As @Akababa noted, this would not do any good due to noise only being applied at the root. Only the first effect would actually apply, but giving up ~10% performance in self-play does not seem like a good idea to me at this point, especially if Dirichlet noise effect could also be fine-tuned with the mixing constant.

The code looks very useful however if we decided at a later point to try some adversarial learning techniques and actually use different networks for both sides of self-play games.

killerducky · 2018-05-04T22:14:37Z

If we want to do something to increase the effectiveness of Dirichlet noise, I would prefer to turn off tree-reuse entirely. Most of the performance should be recovered due to NNCache hits.

Tilps · 2018-05-04T22:23:11Z

The following is why tree reuse affects Dirichlet noise, since apparently it isn't obvious.

Noise is applied to root, 800 visits are played.
Move is decided.
New root calculated.
Noise is applied to the new root. But now only '800 - tree reuse' visits are played.
Therefore for the new root noise to have any effect it has to overcome all the tree reuse visits which were played out with non-noised (and additionally fpu reduced!) uct_select child. This is a significant hurdle to noise induced exploration happening.
Tuning the amount of noise to try and overcome this deficit is non-obvious as the hurdle size varies every turn.

To explain my second point about how having independent trees makes more of a difference than just the reduced tree reuse. Currently if noise causes a discovery, the next player starts from a tree reuse which justified that discovery. By having an independent tree, they probably (or are at least more likely to) have a tree reuse with almost no visits down the discovery path - so they get almost the full effect of noise after the opponent has made a noise induced discovery. (This being in addition to normal possibilities of lower tree reuse due to temperature.)

I think its likely to be well worth the <10% loss in game rate - to reduce the amount that the reinforcement learning can overfit the policy.

I can experiment with disabling tree reuse entirely to see what the performance loss is there. But I think one of these kinds of options will be useful.

Tilps · 2018-05-05T00:00:53Z

So, in what is probably an example of the size of my error bars. Disabling tree reuse entirely comes up as ~10% more moves per hour than the original single tree. (So it certainly seems plausible that NN cache dominates the performance characteristics.)

killerducky · 2018-05-06T02:17:32Z

#536 is a better candidate.

Tilps added 5 commits April 29, 2018 09:57

Merge remote-tracking branch 'refs/remotes/glinscott/next' into next

ae332a4

Merge remote-tracking branch 'refs/remotes/glinscott/next' into next

99e1bd7

Merge remote-tracking branch 'refs/remotes/glinscott/next' into next

2f5ad3f

Fix play_one_game self-play logic.

c811dab

It was spinning on resign, if the 450th ply was a win/loss/draw it wasn't recognized, and if the 450th ply was a win/loss/draw it was trying to play a move from that position.

Switch training to dual trees.

2337b4c

This is a compromise between disabling tree reuse and insufficient application of direlicht noise. Memory usage will increase, but training nodes level is pretty small, so its not much.

Tilps mentioned this pull request May 5, 2018

Disable tree reuse in training. #536

Merged

killerducky closed this May 6, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use independent game trees for white and black during training #528

Use independent game trees for white and black during training #528

Tilps commented May 4, 2018

Tilps commented May 4, 2018

Akababa commented May 4, 2018

jkiliani commented May 4, 2018

killerducky commented May 4, 2018

Tilps commented May 4, 2018

Tilps commented May 5, 2018

killerducky commented May 6, 2018

Use independent game trees for white and black during training #528

Use independent game trees for white and black during training #528

Conversation

Tilps commented May 4, 2018

Tilps commented May 4, 2018

Akababa commented May 4, 2018

jkiliani commented May 4, 2018

killerducky commented May 4, 2018

Tilps commented May 4, 2018

Tilps commented May 5, 2018

killerducky commented May 6, 2018