Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make array_union/array_except/array_intersect handle empty/null arrays rightly #8269

Merged
merged 5 commits into from
Nov 21, 2023

Conversation

Veeupup
Copy link
Contributor

@Veeupup Veeupup commented Nov 19, 2023

Which issue does this PR close?

Closes #8181.

for array_union:

array_union([], []) = []
array_union([], null) = []
array_union(null, []) = []
array_union(null, null) = null

for array_except:

array_except([], []) = []
array_except([], null) = []
array_except(null, []) = null
array_except(null, null) = null

for array_intersect:

array_intersect([], []) = []
array_intersect([], null) = []
array_intersect(null, []) = []
array_intersect(null, null) = null

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt) labels Nov 19, 2023
@Veeupup
Copy link
Contributor Author

Veeupup commented Nov 19, 2023

@jayzhan211 maybe you can check the behavior for these functions now cc @alamb

@Veeupup Veeupup changed the title make array_union handle empty/null arrays rightly make array_union/array_except/array_intersect handle empty/null arrays rightly Nov 19, 2023
@Veeupup Veeupup marked this pull request as ready for review November 19, 2023 15:06
@Veeupup Veeupup force-pushed the fix_empty_for_array_set branch from 9ed1fca to c08d6cb Compare November 19, 2023 15:10
Copy link
Contributor

@jayzhan211 jayzhan211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to fix here.

https://github.com/apache/arrow-datafusion/blob/393e48f98872c696a90fce033fa584533d2326fa/datafusion/sql/src/expr/value.rs#L156-L158

        if data_types.is_empty() {
            // ListArray[NullArray(0)], the same as `make_array()`.
            let array = new_null_array(&DataType::Null, 0);
            let array = Arc::new(array_into_list_array(array));
            Ok(lit(ScalarValue::List(array)))
        } else if data_types.len() > 1 {

datafusion/physical-expr/src/array_expressions.rs Outdated Show resolved Hide resolved
datafusion/expr/src/built_in_function.rs Outdated Show resolved Hide resolved
@jayzhan211
Copy link
Contributor

I think we need to fix here.

https://github.com/apache/arrow-datafusion/blob/393e48f98872c696a90fce033fa584533d2326fa/datafusion/sql/src/expr/value.rs#L156-L158

        if data_types.is_empty() {
            // ListArray[NullArray(0)], the same as `make_array()`.
            let array = new_null_array(&DataType::Null, 0);
            let array = Arc::new(array_into_list_array(array));
            Ok(lit(ScalarValue::List(array)))
        } else if data_types.len() > 1 {

Update: I found we can easily reuse MakeArray for sql_array_literal

        Ok(Expr::ScalarFunction(ScalarFunction::new(
            BuiltinScalarFunction::MakeArray,
            values,
        )))

@Veeupup
Copy link
Contributor Author

Veeupup commented Nov 20, 2023

@jayzhan211 Thanks for your constructive comments!I suppose we make it in a right way : )

Signed-off-by: veeupup <[email protected]>
@Veeupup Veeupup force-pushed the fix_empty_for_array_set branch from a76beaa to be32301 Compare November 20, 2023 15:07
@Veeupup Veeupup requested a review from jayzhan211 November 20, 2023 15:09
@jayzhan211 jayzhan211 mentioned this pull request Nov 21, 2023
@jayzhan211
Copy link
Contributor

jayzhan211 commented Nov 21, 2023

We can move select_array_no_common_type and select_array_non_literal_type to slt

@Veeupup Veeupup requested a review from jayzhan211 November 21, 2023 13:26
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much @Veeupup and @jayzhan211 - this looks really nice and exactly correct 🙏

BuiltinScalarFunction::ArrayIntersect => Ok(input_expr_types[0].clone()),
BuiltinScalarFunction::ArrayUnion => Ok(input_expr_types[0].clone()),
BuiltinScalarFunction::ArrayUnion | BuiltinScalarFunction::ArrayIntersect => {
match (input_expr_types[0].clone(), input_expr_types[1].clone()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes!

@@ -1396,7 +1396,7 @@ SELECT COUNT(DISTINCT c1) FROM test
query ?
SELECT ARRAY_AGG([])
----
[]
[[]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alamb alamb merged commit 54a0247 into apache:main Nov 21, 2023
22 checks passed
{
if !args.iter().all(|arg| {
arg.data_type().equals_datatype(data_type)
|| arg.data_type().equals_datatype(&DataType::Null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Veeupup Why do we need Null checking here, which function failed without this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when it comes to handle multi arrays functions,their input types maybe like null, List(Int32).... Then, but we should handle null type in function body rather than return error here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions physical-expr Physical Expressions sql SQL Planner sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support empty array for array_union, array_intersect, and array_except
3 participants