-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Coalesce function #1969
Add Coalesce function #1969
Conversation
@andygrove @xudong963 Please take a look & review when you find time. |
for column_value in args { | ||
match column_value { | ||
ColumnarValue::Array(array_ref) => { | ||
if array_ref.is_valid(i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation can likely be optimized (I think it should be >10x faster) by not converting values to ScalarValue
and using is_valid
, matching per value, but operating directly on Arrays. That would also allow optimizations like skipping running the coalesce althogether if all items in the array already are non null, or skipping handling the array if every value is null, etc.
This will involve some more work / code though, but maybe a comment that there are some big optimizations possible could be useful.
The current implementation also could likely already be optimized a bit by moving the for loop to the innermost place instead of the outermost place and/or implementing one of the rules like mentioned.
@yjshen Yup. Thanks for pointing out. I'll add it to Ballista with the optimization @Dandandan suggested |
@msathis do you plan to do this as a follow on PR or this PR? In other words is this PR ready for review again or are you still working on it? |
@@ -111,6 +112,11 @@ pub fn return_type( | |||
utf8_to_int_type(&input_expr_types[0], "character_length") | |||
} | |||
BuiltinScalarFunction::Chr => Ok(DataType::Utf8), | |||
BuiltinScalarFunction::Coalesce => { | |||
// COALESCE has multiple args and they might get coerced, get a preview of this | |||
let coerced_types = data_types(input_expr_types, &signature(fun)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any doc or reference to guide for the common result type or the coercion type?
datafusion/tests/sql/functions.rs
Outdated
} | ||
|
||
#[tokio::test] | ||
async fn coalesce_sum_with_default_value() -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to add other case with agg function from #2067 (comment)
select COALESCE(sum(c3),0) from table
All the value of c3 is null.
@msathis
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. i'll update the test case!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can try this case
select COALESCE(sum(c3),0) from table group by xxx
Hi @alamb , Sorry, i got busy with some office work. I'll try to revise the PR in the next couple of days. I was planning to address all the performance optimisations @Dandandan suggested. |
Thank you @msathis -- makes total sense. Let us know if you need any help! |
} | ||
ColumnarValue::Scalar(scalar) => { | ||
if !scalar.is_null() { | ||
// TODO: Figure out how to set value at index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @alamb, I need your help here. I couldn't figure out how to set value to particular index of arrow::Array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you may want to call scalar. to_array_of_size(size)
here to create the appropriate array:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @alamb I think that worked.
@msathis If the pr is ready to review, please at me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @msathis -- this is a great PR. I think it could be merged as is, though I have some small comments / suggestions.
I also updated this PR description as it said "everything was treated as a string" which I do not think applies any longer
"| coalesce(test.c1,test.c2) |", | ||
"+---------------------------+", | ||
"| 0 |", | ||
"| 1 |", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
} | ||
|
||
#[tokio::test] | ||
async fn coalesce_static_empty_value() -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test seems to be the same as coalesce_plan
-- is that intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its an overlook, you're right. i'll remove coalesce_plan
} | ||
|
||
#[tokio::test] | ||
async fn coalesce_static_value_with_null() -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
} | ||
|
||
#[tokio::test] | ||
async fn coalesce_result_with_default_value() -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stylistically this might be better if it were right next to coalesce_result
given its similarity.
For that matter, since it uses the same data setup, it might be cool to combine the two tests (and run two different queries in them)
"+---------------------------------------------+", | ||
]; | ||
assert_batches_eq!(expected, &actual); | ||
Ok(()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice testing 👍
I will wait for @liukun4515 to review as well prior to merging |
@alamb Updated the PR to address the review comments |
@liukun4515 Your review will be great! I think the PR is ready. My knowledge is limited, i feel more optimisation is possible. |
Thanks @msathis -- I think this is looking great |
@liukun4515 would you still like to review this prior to merging? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, Thanks @msathis!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* feat: coalesce function. WIP * fix: add license header * fix: support all primitive data types * fix: add test case and remove commented code * fix: refactor code and support coalesce from ballista * fix: support scalar values in coalesce * fix: clippy * fix: clippy by removing unneeded mut * fix: clippy by removing unneeded mut * fix: address review comments Co-authored-by: Sathis Kumar <[email protected]>
* feat: coalesce function. WIP * fix: add license header * fix: support all primitive data types * fix: add test case and remove commented code * fix: refactor code and support coalesce from ballista * fix: support scalar values in coalesce * fix: clippy * fix: clippy by removing unneeded mut * fix: clippy by removing unneeded mut * fix: address review comments Co-authored-by: Sathis Kumar <[email protected]>
Which issue does this PR close?
Closes #1890.
Rationale for this change
To support
coalesce
function in SQLWhat changes are included in this PR?
Basic
coalesce
support.Now everything is treated as string, will have to work to support other data types properly.Are there any user-facing changes?
Users can use
coalesce
function in SQL now