Question about the transform between true reward and value prefix #33

timothijoe · 2022-10-25T07:41:44Z

Hi,
I was a little confused about how to get true reward from value prefix in core/ctree/cnode.cpp

For the function update_tree_q() in Line 256, the true reward is calculated by
float true_reward = node->value_prefix - parent_value_prefix

Suppose we have a root node_1, with its two child (node_2 and node_3),

Before the while loop, we push node_1 into the node_stack;
For the first time of the while loop, we pop node_1, and push node_2, node_3 into the node_stack, finally we set parent_value_prefix = node_1.value_prefix;

For the second time of the while loop, we pop node_3, (suppose there is no child of node_3 expanded), and we set parent_value_prefix=node3.value_prefix (Line281);

In the third time of the while loop, we pop node_2, when we calc the true reward of node_2 in Line 266,
true_reward = node_2.value_prefix - parent_value_prefix = node_2.value_prefix - node_3.value_prefix,

However, the parent of node_2 is node_1, so the true_reward should be node_2.value_prefix - node_1.value_prefix
So I wonder if there is some problem for the operation for the variable "parent_value_prefix", or I misunderstood the code.

Alhough, in function update_tree_q, we only update the min_max value, so it may not affect the convergence. I wonder if it will convergence faster if there the operation is fixed.

YeWR · 2022-10-31T03:07:28Z

Thank you for your correction!

You are right. It is a bug that results in wrong min/max values on the tree side. Really thank you for your detailed reading. And I think it will affect the convergence or stability or something else.

We will fix this these days and check out the performance :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about the transform between true reward and value prefix #33

Question about the transform between true reward and value prefix #33

timothijoe commented Oct 25, 2022

YeWR commented Oct 31, 2022

Question about the transform between true reward and value prefix #33

Question about the transform between true reward and value prefix #33

Comments

timothijoe commented Oct 25, 2022

YeWR commented Oct 31, 2022