-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consolidate TreeNode
transform and rewrite APIs
#8891
Conversation
cc @alamb, @andygrove, @yahoNanJing, @berkaysynnada this PR is another step to cleaner |
These changes seem reasonable to me and, since they propose a similar approach to other recursive flows, they make understanding easier. However, I am not sure about this change:
|
Thanks for the feedback @berkaysynnada.
I feel that a simple (simpler than than Also, just to add a bit more details to this part of the PR: While I was local testing this PR I accidentally changed all
I'm not sure I get this part. |
The docstring you have updated for
I meant what if we intend to do such:
If we want to offer such a use, I think we may need to consider this way as well. |
This is correct. This is because the proposed new If someone needs |
I checked again. Yes, the order is correct. What I really meant is that when we use Since the nodes access their children, first top-down and then bottom-up traversals seem to be a single pass, doing it in reverse order is like 2 passes, but as a behavior, it may be desirable to have reverse order and should we provide a way for this? |
That's correct. I don't think that botom-up and then top-down can't be achieved in one run.
API users can explicitly call |
Since we set out to simplify and provide easy to use TreeNode API and its related implementations as much as possible, IMO we need to reach a simple state at the first step. As far as I observed, there is no use for this version of |
Although I used the new |
Hi @peter-toth -- thank you for this PR and the various others that are trying to improve the quality of code in DataFusion I think one thing that is challenging for me is to understand how to evaluate the benefits of this and related PRs in an objective manner. This PR, for example, may improve the consistency of APIs (which I happen to think it does) but making this change will also come at a cost for DataFusion users:
For other breaking API changes, there are often other benefits that come with the cost, such as new features or improved performance. I wonder if you can help us articulate how this (and the related refactoring PRs) help DataFusion users -- for example do they
If we can connect this work to a benefit I suspect we would be able to drive it forward more quickly |
Thanks for the question @alamb! I'm new to DataFusion so this discussion could help me a lot to understand what ways are viable in terms of improving the existing code. I came from a different open-source query engine but I think Here are a few things I found weird with the current APIs. These can cause a newcommer (like me) some confusion:
What features I miss from the current API:
So to recap my ideal /// Transforms the tree using `f_down` and `f_up` closures. `f_down` is applied on a
/// node while traversing the tree top-down (pre-order, before the node's children are
/// visited) while `f_up` is applied on a node while traversing the tree bottom-up
/// (post-order, after the the node's children are visited).
///
/// The `f_down` closure takes
/// - a `PD` type payload from its parent
/// and returns a tuple made of:
/// - a possibly modified node,
/// - a `PC` type payload to pass to `f_up`,
/// - a `Vec<FD>` type payload to propagate down to its children
/// (one `FD` element is propagated down to each child),
/// - a `TreeNodeRecursion` enum element to control recursion flow.
///
/// The `f_up` closure takes
/// - a `PC` type payload from `f_down` and
/// - a `Vec<PU>` type payload collected from its children
/// and returns a tuple made of:
/// - a possibly modified node,
/// - a `FU` type payload to propagate up to its parent,
/// - a `TreeNodeRecursion` enum element to control recursion flow.
fn transform_with_payload<FD, PD, PC, FU, PU>(
self,
f_down: &mut FD,
payload_down: PD,
f_up: &mut FU,
) -> Result<(Self, PU)>
where
FD: FnMut(Self, PD) -> Result<(Self, Vec<PD>, PC, TreeNodeRecursion)>,
FU: FnMut(Self, PC, Vec<PU>) -> Result<(Self, PU, TreeNodeRecursion)>, Obviously the closure return types can be extracted to a type alias or pub struct TransformDownResult<N, PD, PC> {
pub transformed_node: N,
pub payload_to_children: Vec<PD>,
pub payload_to_f_up: PC,
pub recursion_control: TreeNodeRecursion,
} All the other functions should be just a specialization of the above:
I believe all these PRs would take us forward to a simple but still versarile APIs that have/are:
Now, as you mentioned some of these proposed changes have conflicting semantics to current APIs. Honestly I can't decide how much turbulance they would cause and so I'm seeking feedback from you and the community. But please note that I don't insist on any of the above functionaly or namings. These are just ideas and I can imagine implementations where:
|
I see -- thank you @peter-toth. Your explanation makes sense. As a historical note, the current TreeNode API came out of unifying different in inconsistent APIs across the Exprs, LogicalPlan , ExectionPlan and PhysicalExprs. It was a large step forward at the time, but also just getting a single API was a major step forward.
I think this is a good insight -- basically that the rationale for this change is to make it easier for new contributors to contribute to DataFusion as they APIs are more uniform.
I think implementing the "one true API" as you have proposed above and then migrating the rest of the codebase to use it (possibly deprecating the old APIs or changing them one at at time) would be the least disruptive. To that end, what I will do is to create a new ticket describing this work so it is not lost in this ticket, and then we can use that ticket to make the overall plan visible |
I filed #8913 -- let me know what you think. What do you think about creating a PR with |
Thanks @alamb! I will share my thoughts regarding The API inconsitency of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TLDR I think this PR is a significant step forward in getting a consistent API and the unification of pre_visit / post_visit -> up/down and a single TreeNodeRecursion
is a significant improvement and I think we could merge this PR
Thank you for all the work on this PR @peter-toth
However I do not fully understand how the code in this PR and the code in #8817 from @ozankabak interact / conflict (anyone else following along can find details on #8913 (comment))
So basically I trust you two to come up with something good. I would be happy to merge this PR, or wait for #8817 and evaluate first.
Again, trying to improve things is very much apprecaited
Yes -- let's first get #8817 in and then I and @berkaysynnada will review and go through this in detail and the others in detail |
Sounds good to me. I left a comment #8817 (comment) on #8817. It can help with further steps of the epic: #8913 |
Thanks everyone. I will carefully review this PR after #8817 is merged and the related updates are done. Thank you again @peter-toth for your efforts! |
f94e193
to
729c9d2
Compare
I've updated the PR description and added One thing came into my mind that we could keep |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks great and is ready to go. Are there any other comments prior to us merging this?
I haven't gone through it yet - I will this weekend. I will let you guys know if I have any concerns. Given that I have been following, I don't expect anything major but I'd like to be prudent on this one as it is a somewhat foundational change. |
I am 80% done reviewing, should be finished tomorrow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @peter-toth for the great work. I went through all changes carefully and see nothing to hold this up.
I created a simplification PR, can you please merge it to this PR and then I will go ahead and merge it to main
.
Two work items for follow-on PRs:
- The code in
expr/src/tree_node/expr.rs
became a little too complex for n-ary (binary and above) expressions. Can you think of a way to simplify that code? - I agree with @alamb that
rewrite().data().rewrite().data()....rewrite().data()
chain looks a little weird. It would be great if we can find a neat way to simplify that flow. But this matters less than (1).
Just give me mention when you finalize by incorporating the simplification PR and I will merge quickly to avoid attracting merge conflicts (which I see happens somewhat frequently as this PR touches many files).
@alamb, this is now good to go from my perspective as well. Any reason to wait before merging? |
@ozankabak, thanks for your PR, I've just merged it. I will try to think about those follow-ups. |
I don't think so -- this one has been opened for a long time and has had many chances to review. 🚀 Thanks again @peter-toth @berkaysynnada @ozankabak and everyone else who participated in the discussion |
Epic work |
Thanks for the review and feedback @berkaysynnada, @alamb, @ozankabak! |
* refactor `TreeNode::rewrite()` * use handle_tree_recursion in `Expr` * use macro for transform recursions * fix api * minor fixes * fix * don't trust `t.transformed` coming from transformation closures, keep the old way of detecting if changes were made * rephrase todo comment, always propagate up `t.transformed` from the transformation closure, fix projection pushdown closure * Fix `TreeNodeRecursion` docs * extend Skip (Prune) functionality to Jump as it is defined in https://synnada.notion.site/synnada/TreeNode-Design-Proposal-bceac27d18504a2085145550e267c4c1 * fix Jump and add tests * jump test fixes * fix clippy * unify "transform" traversals using macros, fix "visit" traversal jumps, add visit jump tests, ensure consistent naming `f` instead of `op`, `f_down` instead of `pre_visit` and `f_up` instead of `post_visit` * fix macro rewrite * minor fixes * minor fix * refactor tests * add transform tests * add apply, transform_down and transform_up tests * refactor tests * test jump on both a and e nodes in both top-down and bottom-up traversals * better transform/rewrite tests * minor fix * simplify tests * add stop tests, reorganize tests * fix previous merges and remove leftover file * Review TreeNode Refactor (#1) * Minor changes * Jump doesn't ignore f_up * update test * Update rewriter * LogicalPlan visit update and propagate from children flags * Update tree_node.rs * Update map_children's --------- Co-authored-by: Mustafa Akur <[email protected]> * fix * minor fixes * fix f_up call when f_down returns jump * simplify code * minor fix * revert unnecessary changes * fix `DynTreeNode` and `ConcreteTreeNode` `transformed` and `tnr` propagation * introduce TransformedResult helper * fix docs * restore transform as alias to trassform_up * restore transform as alias to trassform_up 2 * Simplifications and comment improvements (#2) --------- Co-authored-by: Berkay Şahin <[email protected]> Co-authored-by: Mustafa Akur <[email protected]> Co-authored-by: Mehmet Ozan Kabak <[email protected]>
Which issue does this PR close?
Part of #8913
Rationale for this change
The
TreeNode.rewrite()
/TreeNodeRewriter
is very inconsistent with otherTreeNode
apply / visit functions:TreeNodeRewriter.pre_visit()
currently returnsRewriteRecursion
so as to control the recursion but that enum is very inconsistent with theVisitRecursion
enum that visit functions use.RewriteRecursion::Stop
functionality is equal toVisitRecursion::Skip
, that is to skip (prune) the subtree (don't recurse into children and don't run post-order functions).RewriteRecursion::Skip
doesn't prevent recursing into children. This is different toVisitRecursion::Skip
where recursion into children is not executed.VisitRecursion::Stop
fully stops recursion which functionality isn't available withRewriteRecursion
.TreeNodeRewriter.mutate()
is sometimes a top-down (pre-order) but other times a bottom-up (post-order) function depending onTreeNodeRewriter.pre_visit()
result.This PR removes
RewriteRecursion
and renamesVisitRecursion
to a commonTreeNodeRecursion
to control all (apply / visit / rewrite / transform) APIs. PreviousVisitRecursion::Skip
got renamed toJump
and its functionality is now extended to bottom-up and combined traversals as follows:and changes
TreeNodeRewriter
to incorporate anf_down()
and a nf_up()
methods that both can change the node and both can use the commonTransformed
struct (containing aTreeNodeRecursion
enum) to control how to proceed with recursion:This solution is capable to replace all
RewriteRecursion
functionalities and make theTreeNodeRewriter
s not just cleaner but more versatile.This PR adds a new
transform_down_up()
method as a short form ofrewrite()
/TreeNodeRewriter
to provide a combined traversal:TreeNodeRewriter
but defining 2 closures for top-down and bottom-up transformations is just enough.This PR changes the
Transformed
enum the following struct:This PR modifies the
transform_down()
andtransform_up()
methods to use the newTransformed
struct.This PR modifies the
TreeNodeVisitor
trait to have itsf_down()
andf_up()
alligned withTreeNodeRewriter
trait.Please note that this PR implements 2. from #7942 and fixes the
Transformed
enum issue: #8835What changes are included in this PR?
This PR:
RewriteRecursion
,VisitRecursion
toTreeNodeRecursion
. The previousSkip
element is renamed toJump
, its top-down functionality is adjusted and its bottom-up functionality is added.Transformed
enum to a new struct.TreeNode
transform methods to use the newTransformed
struct.TreeNodeRewriter
to incorporate anf_down()
andf_up()
methods and useTransformed
/TreeNodeRecursion
,TreeNode.transform_down_up()
method.TreeNodeVisitor
to havef_down()
andf_up()
methods aligned withTreeNodeRewriter
.apply
/visit
/transform
/rewrite
APIs to usef
,f_down
,f_up
closure parameter names.Are these changes tested?
Existng tests.
Are there any user-facing changes?
Yes.