-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support nulls and empty for array functions #7338
Conversation
I'll put this on my list -- thanks @jayzhan211 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jayzhan211 -- this is looking like a good start. I think the code for array literals is pretty close, I am not sure about the code for array append. Maybe we can split them into separate PRs
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Hello, @jayzhan211! However, I have some questions about the nature of functions in Arrow Datafusion. I recently thought about solving the issue: #6559 and I noticed a fact that User Defined Functions in DataFusion does not parameterize the null handling behaviour (Unlike DuckDB with I want to hear @alamb's opinion about this opportunity. |
@@ -553,6 +553,198 @@ fn coerce_arguments_for_signature( | |||
.collect::<Result<Vec<_>>>() | |||
} | |||
|
|||
// TODO: Move this function to arrow-rs or common array utils module | |||
// base type is the non-list type | |||
fn base_type(data_type: &DataType) -> Result<DataType> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function definitely has a practical use but it should be expanded for all nested data types (list, fixed_list, map, union ...).
.zip(current_types) | ||
.map(|(expr, from_type)| cast_array_expr(expr, &from_type, &new_type, schema)) | ||
.collect(); | ||
match fun { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am inclined to solve this problem by expanding signature's structure because there is one difficulty with User Defined Function. For example, I am Arrow DataFusion's user and I want to define my own ArrayAppend
implementation (the function new_array_append
). And how this function would handle nulls?
What do you think about it, @alamb and @jayzhan211?
I'm not familiar with udf. Is it possible to customize the behavior of null in udf? Why do we need additional parameters to deal with null? In this example,
|
I plan to review this PR later -- I am on vacation this week so my response will likely be delayed |
Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days. |
Which issue does this PR close?
Closes #7142.
Re-open 7142 if others than append/prepend need to support nulls and empty.
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?