Is it better to use mixed value approximation? #69

CGLemon · 2023-06-17T03:04:14Z

In the paper (Appendix D), DeepMind used the mixed value approximation instead of simple one. It seems that your implementation is simple one. In my experience, the simple one can work on 9x9. But it is crashed on the 19x19. So maybe it is better choice to use mixed value approximation?

    def calculate_completed_q_value(self) -> np.array:

        ~~~~~~~~~~~~~~

        sum_prob = np.sum(policy)
        v_pi = np.sum(policy * q_value)

        return np.where(self.children_visits[:self.num_children] > 0, q_value, v_pi / sum_prob)

The text was updated successfully, but these errors were encountered:

kobanium · 2023-06-20T18:02:47Z

This is because I didn't understand the calculation of v_mix value. So TamaGo must use mixed value approximation. Although I want to change from simple value to mixed value approximation, I'm too busy to change it. I'll change it when I have enough time.

By the way, my experiment on Ray, reinforcement learning using simple value had been done well on 19x19 (16visits/move). So I'm curious why your experiment on 19x19 failed.

CGLemon · 2023-06-21T10:52:29Z

oh... Maybe your implement is different with my implement. Does Ray resacle the Q value in the Gumbel process?

CGLemon · 2023-06-21T14:55:53Z

I forgot to explain the v_mix value. The format is very simple. That is

    sum_prob = np.sum(policy)
    v_pi = np.sum(policy * q_value)
    rhs = v_pi / sum_prob

    lhs = parent_nn_value
    factor = np.sum(self.children_visits)

    v_mix = (1 * lhs + factor * rhs) / (1 + factor)

kobanium · 2023-06-26T17:21:09Z

Thanks for a snippet! Certainly, it is easy to implement.

I don't rescale Q-value because value network output's range is from 0.0 to 1.0.
I think I shouldn't rescale Q-value. It is very sensitive for targets of reinforcement learning process.

CGLemon · 2023-06-27T16:43:24Z

Seem that the rescaling is not necessary for AlphaZero. What’s worse is that it may make the policy too sharp. I fix this issue in my main run. The result shows the new weights is better than before.

fix #69 #71 #72

kobanium added enhancement New feature or request question Further information is requested labels Jun 20, 2023

kobanium mentioned this issue Aug 19, 2023

fix #69 #71 #72 #78

Merged

kobanium closed this as completed in #78 Aug 19, 2023

kobanium added a commit that referenced this issue Aug 19, 2023

Merge pull request #78 from kobanium/develop

3a5461d

fix #69 #71 #72

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it better to use mixed value approximation? #69

Is it better to use mixed value approximation? #69

CGLemon commented Jun 17, 2023

kobanium commented Jun 20, 2023 •

edited

Loading

CGLemon commented Jun 21, 2023

CGLemon commented Jun 21, 2023

kobanium commented Jun 26, 2023

CGLemon commented Jun 27, 2023

Is it better to use mixed value approximation? #69

Is it better to use mixed value approximation? #69

Comments

CGLemon commented Jun 17, 2023

kobanium commented Jun 20, 2023 • edited Loading

CGLemon commented Jun 21, 2023

CGLemon commented Jun 21, 2023

kobanium commented Jun 26, 2023

CGLemon commented Jun 27, 2023

kobanium commented Jun 20, 2023 •

edited

Loading