-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extract CSE logic to datafusion_common
#13002
Conversation
cc @alamb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @peter-toth -- this looks really nice to me. I have a few comment suggestions, and then I would like to run benchmarks to make sure this refactor doesn't accidentally slow down planning
But otherwise this looks great to me
datafusion/common/src/hash_node.rs
Outdated
use std::hash::Hasher; | ||
use std::sync::Arc; | ||
|
||
pub trait HashNode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we please add some documentation to this trait about its purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I've added it in 1cae61c.
|
||
impl HashNode for Expr { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
// KIND, either express or implied. See the License for the | ||
// specific language governing permissions and limitations | ||
// under the License. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to add some documentation explaining what is in the module and what the main entry points are/
Something like
//! Common Subexpression Elimination logic: [`SubTreeNodeEliminator`] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, good idea. Added i documentation in 1cae61c.
FYI @andygrove -- CSE on physical plans is happening |
I ran the benchmarks and there is no change in performance 🚀
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @peter-toth
I do think some of the suggestions I had for comments would help this code, but I also think we could do it as a follow on PR
Thanks @alamb for the benchmark, I will address your comments tomorrow in this PR. |
… shorter names for eliminator and controller, change `CSE::extract_common_nodes()` to return `Result<FoundCommonNodes<N>>` (instead of `Result<Transformed<FoundCommonNodes<N>>>`)
# Conflicts: # datafusion-cli/Cargo.lock
I've fixed a few things in 1cae61c and addressed your comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @peter-toth
@@ -108,7 +131,8 @@ type CommonNodes<'n, N> = IndexMap<Identifier<'n, N>, (N, String)>; | |||
|
|||
type ChildrenList<N> = (Vec<N>, Vec<N>); | |||
|
|||
pub trait SubTreeNodeEliminatorController { | |||
/// The [`TreeNode`] specific definition of elimination. | |||
pub trait CSEController { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 for the new name
Thanks for the review @alamb! |
Which issue does this PR close?
Part of #12599.
Rationale for this change
This PR moves common subexpression elimination logic to
datafusion_common
to be able to share it with physical expressions.What changes are included in this PR?
This PR:
datafusion_common::cse
. The logic is splited toCSE
struct, that does the elimination of common subtrees andCSEController
trait, that can be implemented for differentTreeNode
types.The PR contains 2 implementations.
ExprCSEController
can be used to eliminateExpr
common subtrees, andTestTreeNodeCSEController
is used withTestTreeNode
s in tests. A newPhysicalExprCSEController
will be added in a follow-up PR to eliminate commondyn PhysicalExpr
subtrees.HashNode
trait to be able to implement it fordyn PhysicalExpr
in a folllow-up PR.TestTreeNode
crate public for CSE tests.Are these changes tested?
Yes, with existing but refactored tests.
Are there any user-facing changes?
No.