-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change Expr PartialOrd
to not rely on comparing hash values
#8932
Comments
@tustvold notes on #8908 (comment):
So my conclusion is that this is a latent bug waiting to happen (that will be very hard to track down if/when it does strike) |
PartialOrd
PartialOrd
to not rely on comparing hash values
Hi, I'm new to this project and was poking through the issues, can I try working this one out? It looks like, to remove the hash we would need to first compare by enum variant (it looks like based on the test that the declaration order determines the Ord). I see two solutions for this, either implementing a "discriminant" function to determine the variant ordering, or by having a lot of match arms. In the case of equivalent variant, we can then compare by fields, looks like it should work for the majority of them. Is that the right track? |
I think so . I don't think the actual relative order of It might be worth figuring out how to break this work into some smaller PRs (rather than one large one) -- perhaps by implementing partial ord for sub fields / structs that are used in Expr before trying to do so for the whole thing. |
Good idea! I went back through to check all the DF sub fields/structs, will start the work + breaking it down. PartialOrd Status
|
I was going through the remaining subtypes/structs, and saw an issue that I wasn't sure how to resolve. While most of the remaining can derive PartialOrd, it looks like |
I would personally suggest trying to simply ignore the contents of the schema for partial ord -- I think in general the other fields in structs should be enough to compute equality and ordering |
Describe the bug
As described on #8908, the
PartialOrd
implementation ofExpr
is based on comparing the hash values of the two nodesThis is problematic because there can be different orderings for the same expression:
ahash
Here is the current implementation
https://github.com/apache/arrow-datafusion/blob/b7e13a0af711477ad41450566c14430089edd3f2/datafusion/expr/src/expr.rs#L850-L862
To Reproduce
No response
Expected behavior
I expect that PartialOrd of
Expr
will not change -- we probably have to implement a realPartialOrd
implementation that compares each field (or maybe derive)Additional context
No response
The text was updated successfully, but these errors were encountered: