Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support nulls and empty for array functions #7338

Closed
wants to merge 14 commits into from

Conversation

jayzhan211
Copy link
Contributor

@jayzhan211 jayzhan211 commented Aug 19, 2023

Which issue does this PR close?

Closes #7142.
Re-open 7142 if others than append/prepend need to support nulls and empty.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added sql SQL Planner optimizer Optimizer rules core Core DataFusion crate labels Aug 19, 2023
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Aug 19, 2023
@jayzhan211 jayzhan211 changed the title Support nulls and empty in array functions Support nulls and empty for array append and prepend Aug 19, 2023
@jayzhan211 jayzhan211 marked this pull request as ready for review August 19, 2023 07:35
@jayzhan211
Copy link
Contributor Author

jayzhan211 commented Aug 19, 2023

@izveigor @alamb Ready for review, thanks!

@alamb
Copy link
Contributor

alamb commented Aug 21, 2023

I'll put this on my list -- thanks @jayzhan211

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jayzhan211 -- this is looking like a good start. I think the code for array literals is pretty close, I am not sure about the code for array append. Maybe we can split them into separate PRs

datafusion/sql/src/expr/value.rs Outdated Show resolved Hide resolved
datafusion/sql/src/expr/value.rs Outdated Show resolved Hide resolved
datafusion/sql/src/expr/value.rs Outdated Show resolved Hide resolved
datafusion/sqllogictest/test_files/array.slt Show resolved Hide resolved
datafusion/optimizer/src/analyzer/type_coercion.rs Outdated Show resolved Hide resolved
@jayzhan211 jayzhan211 marked this pull request as draft August 23, 2023 14:49
@jayzhan211 jayzhan211 changed the title Support nulls and empty for array append and prepend Support nulls and empty for array functions Aug 25, 2023
@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Physical Expressions and removed core Core DataFusion crate labels Aug 25, 2023
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
@jayzhan211
Copy link
Contributor Author

jayzhan211 commented Aug 27, 2023

I have extensively restructured the code in order to enhance its readability and extensibility. Also, array_concat is considered.
I'm sure this PR can be further improved but it is time to seek feedback to prevent me from working in the wrong direction.

cc @alamb @izveigor

@jayzhan211 jayzhan211 marked this pull request as ready for review August 27, 2023 07:46
@jayzhan211 jayzhan211 requested a review from alamb August 27, 2023 07:46
Signed-off-by: jayzhan211 <[email protected]>
@jayzhan211 jayzhan211 marked this pull request as draft August 27, 2023 08:04
Signed-off-by: jayzhan211 <[email protected]>
@jayzhan211 jayzhan211 marked this pull request as ready for review August 27, 2023 08:35
@izveigor
Copy link
Contributor

Hello, @jayzhan211!
Sorry for the long delay. I will carefully review these changes tomorrow.

However, I have some questions about the nature of functions in Arrow Datafusion. I recently thought about solving the issue: #6559 and I noticed a fact that User Defined Functions in DataFusion does not parameterize the null handling behaviour (Unlike DuckDB with special link) So If we want to implement new array or other nested data types functions (like struct_insert) maybe it is better to remake signature's structure (I mean adding new boolean argument special to Signature).

I want to hear @alamb's opinion about this opportunity.

@@ -553,6 +553,198 @@ fn coerce_arguments_for_signature(
.collect::<Result<Vec<_>>>()
}

// TODO: Move this function to arrow-rs or common array utils module
// base type is the non-list type
fn base_type(data_type: &DataType) -> Result<DataType> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function definitely has a practical use but it should be expanded for all nested data types (list, fixed_list, map, union ...).

.zip(current_types)
.map(|(expr, from_type)| cast_array_expr(expr, &from_type, &new_type, schema))
.collect();
match fun {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am inclined to solve this problem by expanding signature's structure because there is one difficulty with User Defined Function. For example, I am Arrow DataFusion's user and I want to define my own ArrayAppend implementation (the function new_array_append). And how this function would handle nulls?

What do you think about it, @alamb and @jayzhan211?

@jayzhan211
Copy link
Contributor Author

jayzhan211 commented Aug 29, 2023

Hello, @jayzhan211!
Sorry for the long delay. I will carefully review these changes tomorrow.

However, I have some questions about the nature of functions in Arrow Datafusion. I recently thought about solving the issue: #6559 and I noticed a fact that User Defined Functions in DataFusion does not parameterize the null handling behaviour (Unlike DuckDB with special link) So If we want to implement new array or other nested data types functions (like struct_insert) maybe it is better to remake signature's structure (I mean adding new boolean argument special to Signature).

I want to hear @alamb's opinion about this opportunity.

I'm not familiar with udf. Is it possible to customize the behavior of null in udf? Why do we need additional parameters to deal with null?

In this example,
why not pass Option<Fn>: dont_intercept_null to indicate whether customized null handling? None for default, Some for UDF.

duckdb.create_function('dont_intercept', dont_intercept_null, [BIGINT], BIGINT)
res = duckdb.sql("""
	select dont_intercept(NULL)
""").fetchall()
print(res)
# [(None,)]

duckdb.remove_function('dont_intercept')
duckdb.create_function('dont_intercept', dont_intercept_null, [BIGINT], BIGINT, null_handling='special')
res = duckdb.sql("""
	select dont_intercept(NULL)
""").fetchall()
print(res)

@alamb
Copy link
Contributor

alamb commented Aug 29, 2023

I plan to review this PR later -- I am on vacation this week so my response will likely be delayed

@jayzhan211 jayzhan211 marked this pull request as draft October 28, 2023 23:52
Copy link

github-actions bot commented May 3, 2024

Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the Stale PR has not had any activity for some time label May 3, 2024
@github-actions github-actions bot closed this May 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Physical Expressions sql SQL Planner sqllogictest SQL Logic Tests (.slt) Stale PR has not had any activity for some time
Projects
None yet
Development

Successfully merging this pull request may close these issues.

All array functions should represent NULL as an element
3 participants