-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is it better to use mixed value approximation? #69
Comments
This is because I didn't understand the calculation of v_mix value. So TamaGo must use mixed value approximation. Although I want to change from simple value to mixed value approximation, I'm too busy to change it. I'll change it when I have enough time. By the way, my experiment on Ray, reinforcement learning using simple value had been done well on 19x19 (16visits/move). So I'm curious why your experiment on 19x19 failed. |
oh... Maybe your implement is different with my implement. Does Ray resacle the Q value in the Gumbel process? |
I forgot to explain the
|
Thanks for a snippet! Certainly, it is easy to implement. I don't rescale Q-value because value network output's range is from 0.0 to 1.0. |
Seem that the rescaling is not necessary for AlphaZero. What’s worse is that it may make the policy too sharp. I fix this issue in my main run. The result shows the new weights is better than before. |
In the paper (Appendix D), DeepMind used the mixed value approximation instead of simple one. It seems that your implementation is simple one. In my experience, the simple one can work on 9x9. But it is crashed on the 19x19. So maybe it is better choice to use mixed value approximation?
The text was updated successfully, but these errors were encountered: