-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add type coercion for UDFs in logical plan #3254
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3254 +/- ##
==========================================
+ Coverage 85.58% 85.59% +0.01%
==========================================
Files 296 296
Lines 54179 54231 +52
==========================================
+ Hits 46367 46418 +51
- Misses 7812 7813 +1
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
e6b4df3
to
141d2b0
Compare
@alamb Here is another type coercion rule that we can now add |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 looks great @andygrove
This will likely conflict with #3379
For what it is worth, I would love to move all type coercion rules out of physical planning and into this phase (aka consolidate all coercion)
/// `signature`, if possible. | ||
/// | ||
/// See the module level documentation for more detail on coercion. | ||
pub fn coerce_arguments_for_signature( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this code should already exist somewhere and it would be great to consolidate into a single implementation rather than have multiple implementations around
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is adapted from the coerce
method in the physical-expr
crate that operates on physical expressions rather than logical expressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to public this method?
I think it's better to make it private.
let mut config = OptimizerConfig::default(); | ||
let plan = rule.optimize(&plan, &mut config)?; | ||
assert_eq!( | ||
"Projection: TestScalarUDF(CAST(Int32(123) AS Float32))\n EmptyRelation", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@andygrove @alamb Can we give a plan for the migration of type coercion? |
we can do this after migrating the type coercion from the physical phase to logical phase. |
I will review it tomorrow, it's too later for me today |
@liukun4515 there is #2355 (which I will also add to the description of this PR). Is that what you had in mind? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good -- let's wait for @liukun4515 to review and then merge it in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
only comments about public/private api
/// `signature`, if possible. | ||
/// | ||
/// See the module level documentation for more detail on coercion. | ||
pub fn coerce_arguments_for_signature( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to public this method?
I think it's better to make it private.
After the refactor of the type coercion, do we need to forbid the creation of physical expr directly? |
I would personally prefer allow creating PhysicalExprs but not providing automatic coercion (leaving it up to the user to use the proper types) |
A+B will get the common data type |
I think I find a bug https://github.com/apache/arrow-datafusion/blob/c359018baa8bbb0a227e83df948c903cde4d701f/datafusion/expr/src/binary_rule.rs#L293 for the type coercion in arithmetic op. If we move the type coercion to the logical phase, the type coercion will apply the binary op twice. |
Thanks @liukun4515. I have filed #3388 to track this. I will look at this next. |
Benchmark runs are scheduled for baseline = e6d1364 and contender = 7c04964. 7c04964 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Related to #2355
Follows on from #3222 and #3250
Rationale for this change
We currently perform type coercion for UDFs in the physical planner/optimizer. I would like to have this logic in the logical plan so that other projects (such as Dask SQL) can benefit from this.
What changes are included in this PR?
TypeCoercion
rule to support Scalar UDFsAre there any user-facing changes?
Yes, logical plans will be different in some cases.