Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support ordering analysis with expressions (not just columns) by Replace OrderedColumn with PhysicalSortExpr #6501

Merged
merged 13 commits into from
Jun 5, 2023
Merged

Support ordering analysis with expressions (not just columns) by Replace OrderedColumn with PhysicalSortExpr #6501

merged 13 commits into from
Jun 5, 2023

Conversation

mustafasrepo
Copy link
Contributor

Which issue does this PR close?

Closes #.

Rationale for this change

OrderedColumn struct keeps columns that have ordering, with ordering information. This struct is used during OrderingEquivalence calculations. However, existing PhysicalSortExpr can keep track of this information. Also PhysicalSortExpr supports not just, columns but complex expressions also.

We can use PhysicalSortExpr instead of OrderedColumn.

What changes are included in this PR?

This PR removes OrderedColumn struct and uses PhysicalSortExpr in its place.

However, because PhysicalSortExpr doesn't implement Hash trait (there is no trivial way to support this trait if any). We changed the EquivalentClass implementation so that it doesn't require Hash trait anymore.
For this reason, we have replaced places in EquivalentClass where HashSet is used with Vector.

Are these changes tested?

Yes existing tests should work, also new test is added (under window.slt file) to show that we can use complex expressions (not just Columns) during ordering equivalence calculations.

Are there any user-facing changes?

@github-actions github-actions bot added core Core DataFusion crate physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt) labels May 31, 2023
@alamb alamb changed the title Replace OrderedColumn with PhysicalSortExpr Support ordering analysis with expressions (not just columns) by Replace OrderedColumn with PhysicalSortExpr Jun 2, 2023
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR looks great -- thank you @mustafasrepo and adds a neat feature. cc @mingmwang in case you have any interest in reviewing this

However, because PhysicalSortExpr doesn't implement Hash trait (there is no trivial way to support this trait if any). We changed the EquivalentClass implementation so that it doesn't require Hash trait anymore.

We hit something similar when trying to make LogicalPlan implement hash (because of the LogicalPlan::Extension variant that has a Arc<dyn UserDefinedLogicalNode>

The solution we came up with was

https://docs.rs/datafusion-expr/25.0.0/datafusion_expr/logical_plan/trait.UserDefinedLogicalNode.html#tymethod.dyn_hash

And then implemented it like this: https://docs.rs/datafusion-expr/25.0.0/src/datafusion_expr/logical_plan/extension.rs.html#235-285

datafusion/physical-expr/src/equivalence.rs Show resolved Hide resolved
datafusion/physical-expr/src/equivalence.rs Outdated Show resolved Hide resolved

/// Remove `entry` for the `in_data`, returns `true` if removal is successful (e.g `entry` is indeed in the `in_data`)
/// Otherwise return `false`
fn remove_from_vec<T: PartialEq>(in_data: &mut Vec<T>, entry: &T) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps a more idiomatic way would be for this function to return Option<T> (which is what Some(in_data.remove()) returns )

That might allow you to avoid some of the other changes to remove later

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since, we remove by giving element inside the vector. We already have removed element. If we return Option<T> the value inside Option will be entry argument to the function. Hence this function is more akin to HashSet remove. Also inside remove function we are interested in whether removal was successful, in this case we need to introduce is_some checks inside remove function.
Hence I think, current API is more clear, However, if it is misleading, or counter intuitive I can implement as your suggestion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense -- thank you for the response

datafusion/physical-expr/src/equivalence.rs Show resolved Hide resolved
datafusion/physical-expr/src/equivalence.rs Show resolved Hide resolved
# Please enter a commit message to explain why this merge is necessary,
# especially if it merges an updated upstream into a topic branch.
#
# Lines starting with '#' will be ignored, and an empty message aborts
# the commit.
@mustafasrepo
Copy link
Contributor Author

I think this PR looks great -- thank you @mustafasrepo and adds a neat feature. cc @mingmwang in case you have any interest in reviewing this

However, because PhysicalSortExpr doesn't implement Hash trait (there is no trivial way to support this trait if any). We changed the EquivalentClass implementation so that it doesn't require Hash trait anymore.

We hit something similar when trying to make LogicalPlan implement hash (because of the LogicalPlan::Extension variant that has a Arc<dyn UserDefinedLogicalNode>

The solution we came up with was

https://docs.rs/datafusion-expr/25.0.0/datafusion_expr/logical_plan/trait.UserDefinedLogicalNode.html#tymethod.dyn_hash

And then implemented it like this: https://docs.rs/datafusion-expr/25.0.0/src/datafusion_expr/logical_plan/extension.rs.html#235-285

I will experiment with using dyn hash, I think this will simplify the structure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants