Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement PartialOrd for Expr and sub fields/structs without using hash values #12481

Merged
merged 8 commits into from
Sep 21, 2024

Conversation

ngli-me
Copy link
Contributor

@ngli-me ngli-me commented Sep 16, 2024

Which issue does this PR close?

Closes #8932.

Rationale for this change

Allows for ordering of Expr without using hashing, which is not necessarily consistent between runs.

What changes are included in this PR?

Added PartialOrd via derive where possible.
For fields of type DFSchema/Schema and/or HashMap are ignored for comparison.
DescribeTable has PartialOrd via LogicalPlan, but does not have a comparison versus itself.

Are these changes tested?

Yes, but not exhaustively.

Are there any user-facing changes?

All of the subtypes of Expr will have an available PartialOrd.
Structs implementing UserDefinedLogicalNodeCore will also need to have an implementation of PartialOrd.

@github-actions github-actions bot added logical-expr Logical plan and expressions substrait common Related to common crate labels Sep 16, 2024
@ngli-me
Copy link
Contributor Author

ngli-me commented Sep 16, 2024

It might be worth it to include the derivative crate for datafusion/datafusion/expr, to allow for a derived PartialOrd, excluding the incomparable fields. Currently, some of the comparisons are unwieldy. Another alternative would be using a series of comparison if statements instead of matching, to prevent such deep nesting. I am open to ideas!

@ngli-me ngli-me marked this pull request as draft September 16, 2024 04:06
…or structs implementing UserDefinedLogicalNodeCore.
@github-actions github-actions bot added optimizer Optimizer rules core Core DataFusion crate proto Related to proto crate labels Sep 16, 2024
@ngli-me ngli-me marked this pull request as ready for review September 16, 2024 15:38
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @ngli-me -- I think this PR is quite close.

I think primarily it needs some comments explaining why the manual implementations of PartialOrd are needed and resolve some conflicts, but otherwise this is ready from my perspective.

Thanks again 🙏

@@ -1071,18 +1071,6 @@ impl PlannedReplaceSelectItem {
}
}

/// Fixed seed for the hashing so that Ords are consistent across runs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@@ -232,8 +232,61 @@ impl Hash for CreateExternalTable {
}
}

impl PartialOrd for CreateExternalTable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What requires this manual derivation? Is it some other extension trait would need to also require PartialOrd?

I am thinking it would be really nice to file a ticket tracking what it would take to use a derived PartialOrd to make this code more maintainable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm not super happy with the way the syntax on this turned out, but the first field, schema prevents us from deriving it. I was looking at some alternatives, but it looks like Rust doesn't have a way to ignore a field for a derived trait by default. One option is to use something like the derivative crate, but that means another attribute for the struct, and additional attributes for the field.

In general, DFSchemaRef and Schema are the main blockers for deriving PartialOrd for structs, with the other cases (also) having HashMaps.

Copy link
Contributor Author

@ngli-me ngli-me Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the ticket tracking thing, now that you mention it, it might be possible to give DFSchemaRef a manual implementation of PartialOrd that just gives back None, then everything else (almost) can derive it! But yeah, I'll create another ticket for this (#12537).

@@ -284,6 +346,15 @@ pub struct CreateCatalogSchema {
pub schema: DFSchemaRef,
}

impl PartialOrd for CreateCatalogSchema {
fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
match self.schema_name.partial_cmp(&other.schema_name) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we derive PartialOrd for this structure?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps you can add a comment explaining it is due to not comparing schema ref

Another way you could do this might be with some inner struct like:

impl PartialOrd for CreateCatalogSchema {
    fn partial_cmp(&self, other: &Self) -> Option<Ordering> { 
      #[derive(PartialOrd)]
      struct ComparableCreateCatalogSchema<'a> {
       /// The table schema
       pub schema_name: &'astr,
      }
      let comparable_self = ComparableCreateCatalogSchema { schema_name: &self.schema_name}
      let comparable_other= ComparableCreateCatalogSchema { schema_name: &other.schema_name}
      comparable_self.partial_cmp(comparable_other)
   }
}

For this particular impl it probably doesn't matter but for CreateExternalTable it might be easier to follow and verify it is correct

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this, I'll make some changes for the bigger manual implementations to follow this format to begin with!

@@ -215,7 +223,7 @@ impl Eq for dyn UserDefinedLogicalNode {}
/// [user_defined_plan.rs](https://github.com/apache/datafusion/blob/main/datafusion/core/tests/user_defined/user_defined_plan.rs)
/// file for an example of how to use this extension API.
pub trait UserDefinedLogicalNodeCore:
fmt::Debug + Eq + Hash + Sized + Send + Sync + 'static
fmt::Debug + Eq + PartialOrd + Hash + Sized + Send + Sync + 'static
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is technically a API change (but I think it is fine) -- I will mark this PR as an API change

@alamb alamb marked this pull request as draft September 18, 2024 19:45
@alamb
Copy link
Contributor

alamb commented Sep 18, 2024

Marking as draft as I think this PR is no longer waiting on feedback. Please mark it as ready for review when it is ready for another look

@ngli-me
Copy link
Contributor Author

ngli-me commented Sep 19, 2024

Thanks for the feedback, really appreciate it! I'm happy to help out, I'll start digging for some more issues following this (I had some personal stuff come up, so sorry about leaving it for a bit. I'll keep in mind marking it as draft next time).
I made some revisions based on your feedback, with putting that inner struct in place for comparing types with >3 comparable fields.

@ngli-me ngli-me marked this pull request as ready for review September 19, 2024 23:24
…plementation of `PartialOrd` to be necessary.
@alamb
Copy link
Contributor

alamb commented Sep 20, 2024

Thanks for the feedback, really appreciate it! I'm happy to help out, I'll start digging for some more issues following this (I had some personal stuff come up, so sorry about leaving it for a bit.

No worries at all -- I totally understand -- there is always more fun stuff to do than time to do. We appreciate the effort

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @ngli-me -- this is pretty epic work.

Also thank you for filing the follow on

@@ -193,6 +194,7 @@ pub trait UserDefinedLogicalNode: fmt::Debug + Send + Sync {
/// Note: [`UserDefinedLogicalNode`] is not constrained by [`Eq`]
/// directly because it must remain object safe.
fn dyn_eq(&self, other: &dyn UserDefinedLogicalNode) -> bool;
fn dyn_ord(&self, other: &dyn UserDefinedLogicalNode) -> Option<Ordering>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alamb alamb added the api change Changes the API exposed to users of the crate label Sep 20, 2024
@alamb alamb changed the title PartialOrd for Expr and sub fields/structs Implement PartialOrd for Expr and sub fields/structs without using hash values Sep 20, 2024
@alamb alamb merged commit d9cb6e6 into apache:main Sep 21, 2024
25 checks passed
@alamb
Copy link
Contributor

alamb commented Sep 21, 2024

THanks again @ngli-me

bgjackma pushed a commit to bgjackma/datafusion that referenced this pull request Sep 25, 2024
…sh values (apache#12481)

* Derive PartialOrd for WindowFrameBound, and added partial_cmp for ScalarUDF based on the equals fn.

* Derived PartialOrd where possible for structs not using DFSchema/Schema. Otherwise, implemented PartialOrd, ignoring the schema field.

* Added additional tests to verify partial ord for Expr, DdlStatement, and LogicalPlan.

* Formatting.

* Added PartialOrd implementations where necessary, otherwise derived for structs implementing UserDefinedLogicalNodeCore.

* Added Comparable versions for structs with more than 3 comparable fields.

* Added additional comments, specifying which fields caused a manual implementation of `PartialOrd` to be necessary.

---------

Co-authored-by: nglime <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api change Changes the API exposed to users of the crate common Related to common crate core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules proto Related to proto crate substrait
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Change Expr PartialOrd to not rely on comparing hash values
2 participants