Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about the transform between true reward and value prefix #33

Open
timothijoe opened this issue Oct 25, 2022 · 1 comment
Open

Comments

@timothijoe
Copy link

Hi,
I was a little confused about how to get true reward from value prefix in core/ctree/cnode.cpp

For the function update_tree_q() in Line 256, the true reward is calculated by
float true_reward = node->value_prefix - parent_value_prefix

Suppose we have a root node_1, with its two child (node_2 and node_3),

Before the while loop, we push node_1 into the node_stack;
For the first time of the while loop, we pop node_1, and push node_2, node_3 into the node_stack, finally we set parent_value_prefix = node_1.value_prefix;

For the second time of the while loop, we pop node_3, (suppose there is no child of node_3 expanded), and we set parent_value_prefix=node3.value_prefix (Line281);

In the third time of the while loop, we pop node_2, when we calc the true reward of node_2 in Line 266,
true_reward = node_2.value_prefix - parent_value_prefix = node_2.value_prefix - node_3.value_prefix,

However, the parent of node_2 is node_1, so the true_reward should be node_2.value_prefix - node_1.value_prefix
So I wonder if there is some problem for the operation for the variable "parent_value_prefix", or I misunderstood the code.

Alhough, in function update_tree_q, we only update the min_max value, so it may not affect the convergence. I wonder if it will convergence faster if there the operation is fixed.

@YeWR
Copy link
Owner

YeWR commented Oct 31, 2022

Thank you for your correction!

You are right. It is a bug that results in wrong min/max values on the tree side. Really thank you for your detailed reading. And I think it will affect the convergence or stability or something else.

We will fix this these days and check out the performance :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants