You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I was a little confused about how to get true reward from value prefix in core/ctree/cnode.cpp
For the function update_tree_q() in Line 256, the true reward is calculated by
float true_reward = node->value_prefix - parent_value_prefix
Suppose we have a root node_1, with its two child (node_2 and node_3),
Before the while loop, we push node_1 into the node_stack;
For the first time of the while loop, we pop node_1, and push node_2, node_3 into the node_stack, finally we set parent_value_prefix = node_1.value_prefix;
For the second time of the while loop, we pop node_3, (suppose there is no child of node_3 expanded), and we set parent_value_prefix=node3.value_prefix (Line281);
In the third time of the while loop, we pop node_2, when we calc the true reward of node_2 in Line 266,
true_reward = node_2.value_prefix - parent_value_prefix = node_2.value_prefix - node_3.value_prefix,
However, the parent of node_2 is node_1, so the true_reward should be node_2.value_prefix - node_1.value_prefix
So I wonder if there is some problem for the operation for the variable "parent_value_prefix", or I misunderstood the code.
Alhough, in function update_tree_q, we only update the min_max value, so it may not affect the convergence. I wonder if it will convergence faster if there the operation is fixed.
The text was updated successfully, but these errors were encountered:
You are right. It is a bug that results in wrong min/max values on the tree side. Really thank you for your detailed reading. And I think it will affect the convergence or stability or something else.
We will fix this these days and check out the performance :)
Hi,
I was a little confused about how to get true reward from value prefix in core/ctree/cnode.cpp
For the function update_tree_q() in Line 256, the true reward is calculated by
float true_reward = node->value_prefix - parent_value_prefix
Suppose we have a root node_1, with its two child (node_2 and node_3),
Before the while loop, we push node_1 into the node_stack;
For the first time of the while loop, we pop node_1, and push node_2, node_3 into the node_stack, finally we set parent_value_prefix = node_1.value_prefix;
For the second time of the while loop, we pop node_3, (suppose there is no child of node_3 expanded), and we set parent_value_prefix=node3.value_prefix (Line281);
In the third time of the while loop, we pop node_2, when we calc the true reward of node_2 in Line 266,
true_reward = node_2.value_prefix - parent_value_prefix = node_2.value_prefix - node_3.value_prefix,
However, the parent of node_2 is node_1, so the true_reward should be node_2.value_prefix - node_1.value_prefix
So I wonder if there is some problem for the operation for the variable "parent_value_prefix", or I misunderstood the code.
Alhough, in function update_tree_q, we only update the min_max value, so it may not affect the convergence. I wonder if it will convergence faster if there the operation is fixed.
The text was updated successfully, but these errors were encountered: