-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement PartialOrd for Expr and sub fields/structs without using hash values #12481
Conversation
…larUDF based on the equals fn.
…ma. Otherwise, implemented PartialOrd, ignoring the schema field.
It might be worth it to include the derivative crate for datafusion/datafusion/expr, to allow for a derived PartialOrd, excluding the incomparable fields. Currently, some of the comparisons are unwieldy. Another alternative would be using a series of comparison if statements instead of matching, to prevent such deep nesting. I am open to ideas! |
…or structs implementing UserDefinedLogicalNodeCore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @ngli-me -- I think this PR is quite close.
I think primarily it needs some comments explaining why the manual implementations of PartialOrd
are needed and resolve some conflicts, but otherwise this is ready from my perspective.
Thanks again 🙏
@@ -1071,18 +1071,6 @@ impl PlannedReplaceSelectItem { | |||
} | |||
} | |||
|
|||
/// Fixed seed for the hashing so that Ords are consistent across runs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
@@ -232,8 +232,61 @@ impl Hash for CreateExternalTable { | |||
} | |||
} | |||
|
|||
impl PartialOrd for CreateExternalTable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What requires this manual derivation? Is it some other extension trait would need to also require PartialOrd
?
I am thinking it would be really nice to file a ticket tracking what it would take to use a derived PartialOrd
to make this code more maintainable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm not super happy with the way the syntax on this turned out, but the first field, schema
prevents us from deriving it. I was looking at some alternatives, but it looks like Rust doesn't have a way to ignore a field for a derived trait by default. One option is to use something like the derivative crate, but that means another attribute for the struct, and additional attributes for the field.
In general, DFSchemaRef
and Schema
are the main blockers for deriving PartialOrd for structs, with the other cases (also) having HashMaps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the ticket tracking thing, now that you mention it, it might be possible to give DFSchemaRef
a manual implementation of PartialOrd
that just gives back None
, then everything else (almost) can derive it! But yeah, I'll create another ticket for this (#12537).
@@ -284,6 +346,15 @@ pub struct CreateCatalogSchema { | |||
pub schema: DFSchemaRef, | |||
} | |||
|
|||
impl PartialOrd for CreateCatalogSchema { | |||
fn partial_cmp(&self, other: &Self) -> Option<Ordering> { | |||
match self.schema_name.partial_cmp(&other.schema_name) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't we derive PartialOrd
for this structure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps you can add a comment explaining it is due to not comparing schema ref
Another way you could do this might be with some inner struct like:
impl PartialOrd for CreateCatalogSchema {
fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
#[derive(PartialOrd)]
struct ComparableCreateCatalogSchema<'a> {
/// The table schema
pub schema_name: &'astr,
}
let comparable_self = ComparableCreateCatalogSchema { schema_name: &self.schema_name}
let comparable_other= ComparableCreateCatalogSchema { schema_name: &other.schema_name}
comparable_self.partial_cmp(comparable_other)
}
}
For this particular impl it probably doesn't matter but for CreateExternalTable
it might be easier to follow and verify it is correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this, I'll make some changes for the bigger manual implementations to follow this format to begin with!
@@ -215,7 +223,7 @@ impl Eq for dyn UserDefinedLogicalNode {} | |||
/// [user_defined_plan.rs](https://github.com/apache/datafusion/blob/main/datafusion/core/tests/user_defined/user_defined_plan.rs) | |||
/// file for an example of how to use this extension API. | |||
pub trait UserDefinedLogicalNodeCore: | |||
fmt::Debug + Eq + Hash + Sized + Send + Sync + 'static | |||
fmt::Debug + Eq + PartialOrd + Hash + Sized + Send + Sync + 'static |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is technically a API change (but I think it is fine) -- I will mark this PR as an API change
Marking as draft as I think this PR is no longer waiting on feedback. Please mark it as ready for review when it is ready for another look |
…ility on large PartialOrd implementations.
Thanks for the feedback, really appreciate it! I'm happy to help out, I'll start digging for some more issues following this (I had some personal stuff come up, so sorry about leaving it for a bit. I'll keep in mind marking it as draft next time). |
…plementation of `PartialOrd` to be necessary.
No worries at all -- I totally understand -- there is always more fun stuff to do than time to do. We appreciate the effort |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @ngli-me -- this is pretty epic work.
Also thank you for filing the follow on
@@ -193,6 +194,7 @@ pub trait UserDefinedLogicalNode: fmt::Debug + Send + Sync { | |||
/// Note: [`UserDefinedLogicalNode`] is not constrained by [`Eq`] | |||
/// directly because it must remain object safe. | |||
fn dyn_eq(&self, other: &dyn UserDefinedLogicalNode) -> bool; | |||
fn dyn_ord(&self, other: &dyn UserDefinedLogicalNode) -> Option<Ordering>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
THanks again @ngli-me |
…sh values (apache#12481) * Derive PartialOrd for WindowFrameBound, and added partial_cmp for ScalarUDF based on the equals fn. * Derived PartialOrd where possible for structs not using DFSchema/Schema. Otherwise, implemented PartialOrd, ignoring the schema field. * Added additional tests to verify partial ord for Expr, DdlStatement, and LogicalPlan. * Formatting. * Added PartialOrd implementations where necessary, otherwise derived for structs implementing UserDefinedLogicalNodeCore. * Added Comparable versions for structs with more than 3 comparable fields. * Added additional comments, specifying which fields caused a manual implementation of `PartialOrd` to be necessary. --------- Co-authored-by: nglime <[email protected]>
Which issue does this PR close?
Closes #8932.
Rationale for this change
Allows for ordering of Expr without using hashing, which is not necessarily consistent between runs.
What changes are included in this PR?
Added PartialOrd via derive where possible.
For fields of type DFSchema/Schema and/or HashMap are ignored for comparison.
DescribeTable has PartialOrd via LogicalPlan, but does not have a comparison versus itself.
Are these changes tested?
Yes, but not exhaustively.
Are there any user-facing changes?
All of the subtypes of Expr will have an available PartialOrd.
Structs implementing
UserDefinedLogicalNodeCore
will also need to have an implementation ofPartialOrd
.