-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't preserve functional dependency when generating UNION logical plan #12979
Conversation
…an (#44) * Don't preserve functional dependency when generating UNION logical plan * Remove extra lines
LGTM 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix @Sevenannn. I have just one comment to consider. Otherwise, LGTM
let schema = (**left_plan.schema()).clone(); | ||
let schema = | ||
Arc::new(schema.with_functional_dependencies(FunctionalDependencies::empty())?); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is clearing out all dependencies the right fix? Could we retain some if they do not harm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll wait to merge this PR until tomorrow to give @Sevenannn a chance to respond
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @berkaysynnada, thanks for the review! I don’t think any FD persists when performing UNION on 2 tables.
A simple example would be UNION table t1 with another table t2 which only has 1 row, there always exists such data in t2 which could break the FDs in t1 / t2 after the UNION.
In this case, clearing FDs would be the right fix since we don’t want FDs to get wrongly retained and affect later plans, e.g. aggregation.
Please let me know if you have any further questions regarding this PR, thanks!
Thanks again @Sevenannn and @sgrebnov and @berkaysynnada |
…an (#44) (apache#12979) * Don't preserve functional dependency when generating UNION logical plan * Remove extra lines
…an (#44) (apache#12979) * Don't preserve functional dependency when generating UNION logical plan * Remove extra lines
…an (#44) (apache#12979) * Don't preserve functional dependency when generating UNION logical plan * Remove extra lines
…an (#44) (apache#12979) * Don't preserve functional dependency when generating UNION logical plan * Remove extra lines
Which issue does this PR close?
Closes #12980
Rationale for this change
When the datafusion logical planner build the AGGREGATE plan, it adds additional columns in the group_expr based on the functional dependencies. However, for queries that are aggregating upon table obatined through UNION operation, the functional dependency is still preserved in the schema of UNION plan, while the functional dependency no longer retains after the UNION. This causes wrong column being added as group_by column in aggregation plan
What changes are included in this PR?
UNION
logical planAre these changes tested?
Yes, unit test
test_aggregate_with_union
is added to verify the changeAre there any user-facing changes?
Aggregation based upon UNION results will not produce wrong results with duplicated groups