-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Raise a plan error on union if column count is not the same between plans #13117
Conversation
if left_plan.schema().fields().len() != right_plan.schema().fields().len() { | ||
return plan_err!( | ||
"UNION queries have different number of columns: \ | ||
left has {} columns whereas right has {} columns", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good check to have. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I agree -- it is an easy check to add that can catch clearly incorrect plans (though as @findepi notes it won't catch all issues)
This is good check to have. 👍 Obviously having same column count doesn't guarantee compatibility. The types need to be coercible, so normally the column count check should go together with the type check.
The consequence of this is that
I would want to see these addressed, that's the idea of #12723. Maybe we should have a form of validator of logical plan well-formedness (union branches match, the types match exactly) and the check would go there. The downside of such approach (and of current state as well) is that, when you have code taking Logical Plan UNION, you can't really assume much (things may or may not have been checked), so the code needs to be very defensively written. |
I think this is what the the type coercion analyzer pass already does. If the goal is to provide users errors sooner (when DataFrame is created) rather than later (when DataFrame is run) some other ideas:
|
I agree with this analysis and conclusion
I agree it is not an optimizer -- it an I think the analyzer docs explain this: https://docs.rs/datafusion/latest/datafusion/optimizer/trait.AnalyzerRule.html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if left_plan.schema().fields().len() != right_plan.schema().fields().len() { | ||
return plan_err!( | ||
"UNION queries have different number of columns: \ | ||
left has {} columns whereas right has {} columns", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I agree -- it is an easy check to add that can catch clearly incorrect plans (though as @findepi notes it won't catch all issues)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Omega359 . Raising error earlier makes sense to me
Which issue does this PR close?
Closes #13092
Rationale for this change
Raise the error earlier, restore the behaviour of previous versions of DF
What changes are included in this PR?
code, slt test.
Are these changes tested?
yes
Are there any user-facing changes?
no