-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support ordering analysis with expressions (not just columns) by Replace OrderedColumn
with PhysicalSortExpr
#6501
Support ordering analysis with expressions (not just columns) by Replace OrderedColumn
with PhysicalSortExpr
#6501
Conversation
# Conflicts: # datafusion/core/src/physical_plan/windows/mod.rs # datafusion/physical-expr/src/equivalence.rs
# Conflicts: # datafusion/physical-expr/src/equivalence.rs
OrderedColumn
with PhysicalSortExpr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this PR looks great -- thank you @mustafasrepo and adds a neat feature. cc @mingmwang in case you have any interest in reviewing this
However, because PhysicalSortExpr doesn't implement Hash trait (there is no trivial way to support this trait if any). We changed the EquivalentClass implementation so that it doesn't require Hash trait anymore.
We hit something similar when trying to make LogicalPlan
implement hash (because of the LogicalPlan::Extension
variant that has a Arc<dyn UserDefinedLogicalNode>
The solution we came up with was
And then implemented it like this: https://docs.rs/datafusion-expr/25.0.0/src/datafusion_expr/logical_plan/extension.rs.html#235-285
|
||
/// Remove `entry` for the `in_data`, returns `true` if removal is successful (e.g `entry` is indeed in the `in_data`) | ||
/// Otherwise return `false` | ||
fn remove_from_vec<T: PartialEq>(in_data: &mut Vec<T>, entry: &T) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps a more idiomatic way would be for this function to return Option<T>
(which is what Some(in_data.remove())
returns )
That might allow you to avoid some of the other changes to remove
later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since, we remove by giving element inside the vector. We already have removed element. If we return Option<T>
the value inside Option
will be entry
argument to the function. Hence this function is more akin to HashSet
remove
. Also inside remove
function we are interested in whether removal was successful, in this case we need to introduce is_some
checks inside remove
function.
Hence I think, current API is more clear, However, if it is misleading, or counter intuitive I can implement as your suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense -- thank you for the response
# Please enter a commit message to explain why this merge is necessary, # especially if it merges an updated upstream into a topic branch. # # Lines starting with '#' will be ignored, and an empty message aborts # the commit.
I will experiment with using |
Which issue does this PR close?
Closes #.
Rationale for this change
OrderedColumn
struct keeps columns that have ordering, with ordering information. This struct is used duringOrderingEquivalence
calculations. However, existingPhysicalSortExpr
can keep track of this information. AlsoPhysicalSortExpr
supports not just, columns but complex expressions also.We can use
PhysicalSortExpr
instead ofOrderedColumn
.What changes are included in this PR?
This PR removes
OrderedColumn
struct and usesPhysicalSortExpr
in its place.However, because
PhysicalSortExpr
doesn't implementHash
trait (there is no trivial way to support this trait if any). We changed theEquivalentClass
implementation so that it doesn't requireHash
trait anymore.For this reason, we have replaced places in
EquivalentClass
whereHashSet
is used withVector
.Are these changes tested?
Yes existing tests should work, also new test is added (under
window.slt
file) to show that we can use complex expressions (not just Columns) during ordering equivalence calculations.Are there any user-facing changes?