Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Int64 as default type for make_array function empty or null case #10790

Merged
merged 2 commits into from
Jun 6, 2024

Conversation

jayzhan211
Copy link
Contributor

@jayzhan211 jayzhan211 commented Jun 4, 2024

Which issue does this PR close?

Closes #10789 .

Rationale for this change

What changes are included in this PR?

  1. Convert List(Null) and LargeList(Null) to List(I64) and LargeList(I64) respectively
  2. Fix related array function

Are these changes tested?

Are there any user-facing changes?

Signed-off-by: jayzhan211 <[email protected]>
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Jun 4, 2024
Signed-off-by: jayzhan211 <[email protected]>
@@ -346,8 +346,8 @@ AS VALUES
(arrow_cast(make_array([[1,2]], [[3, 4]]), 'FixedSizeList(2, List(List(Int64)))'), arrow_cast(make_array([1], [2]), 'FixedSizeList(2, List(Int64))')),
(arrow_cast(make_array([[1,2]], [[4, 4]]), 'FixedSizeList(2, List(List(Int64)))'), arrow_cast(make_array([1,2], [3, 4]), 'FixedSizeList(2, List(Int64))')),
(arrow_cast(make_array([[1,2]], [[4, 4]]), 'FixedSizeList(2, List(List(Int64)))'), arrow_cast(make_array([1,2,3], [1]), 'FixedSizeList(2, List(Int64))')),
(arrow_cast(make_array([[1], [2]], []), 'FixedSizeList(2, List(List(Int64)))'), arrow_cast(make_array([2], [3]), 'FixedSizeList(2, List(Int64))')),
(arrow_cast(make_array([[1], [2]], []), 'FixedSizeList(2, List(List(Int64)))'), arrow_cast(make_array([1], [2]), 'FixedSizeList(2, List(Int64))')),
(arrow_cast(make_array([[1], [2]], [[]]), 'FixedSizeList(2, List(List(Int64)))'), arrow_cast(make_array([2], [3]), 'FixedSizeList(2, List(Int64))')),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now they should have same dimension unlike null type

@@ -2666,19 +2679,19 @@ select array_concat(make_array(), make_array(2, 3));
query ?
select array_concat(make_array(make_array(1, 2), make_array(3, 4)), make_array(make_array()));
----
[[1, 2], [3, 4]]
[[1, 2], [3, 4], []]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consistent with duckdb

query ?
select array_sort([]);
----
[]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

free fix because of changing default type to i64

query ?
select array_concat([]);
----
[]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

free fix because of changing default type to i64

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed #10200 👍

@jayzhan211 jayzhan211 marked this pull request as ready for review June 4, 2024 23:06
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a reasonable change to me. Thanks @jayzhan211

BTW I tested in duckdb and it seems like the default type is actually int32 but I think int64 is close enough

D select [];
┌───────────────────┐
│ main.list_value() │
│      int32[]      │
├───────────────────┤
│ []                │
└───────────────────┘

@jayzhan211
Copy link
Contributor Author

Looks like a reasonable change to me. Thanks @jayzhan211

BTW I tested in duckdb and it seems like the default type is actually int32 but I think int64 is close enough

D select [];
┌───────────────────┐
│ main.list_value() │
│      int32[]      │
├───────────────────┤
│ []                │
└───────────────────┘

Yes, they use i32. The reason I choose i64 is that the default value in datafusion is mostly i64 so we can avoid the cast for most of the case. We can easily convert it to i32 anytime if there is any need

@jayzhan211
Copy link
Contributor Author

Thanks @alamb

@jayzhan211 jayzhan211 merged commit 053b53e into apache:main Jun 6, 2024
23 checks passed
findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
…che#10790)

* set default type i64

Signed-off-by: jayzhan211 <[email protected]>

* fmt

Signed-off-by: jayzhan211 <[email protected]>

---------

Signed-off-by: jayzhan211 <[email protected]>
@findepi
Copy link
Member

findepi commented Sep 2, 2024

it seems counter-intuitive to me to infer Int64 where it was not provided by a user nor schema of tables a query operates on. It might be transparent to the user, in which case it's probably fine, but it's also likely that this will transpire to some error message, when an array made form make_array is used in some later processing.

> SELECT arrow_typeof(NULL), arrow_typeof([NULL][0]);
+--------------------+------------------------------------------+
| arrow_typeof(NULL) | arrow_typeof(make_array(NULL)[Int64(0)]) |
+--------------------+------------------------------------------+
| Null               | Int64                                    |
+--------------------+------------------------------------------+

Also, it's likely to affect coercion rules in the future.
NULL type is coercible to any other type, but Int64 probably won't be.

@jayzhan211
Copy link
Contributor Author

Also, it's likely to affect coercion rules in the future.
NULL type is coercible to any other type, but Int64 probably won't be.

I think any type is coercible for Null. Int64(Null) has no problem to be AnyType(Null) because it is Null.
In ScalarValue or Arrow, Null is actually None but wrapped in different type.

@jayzhan211 jayzhan211 deleted the array-i64-default branch September 2, 2024 13:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Int64 as default type for make_array function empty or null case
4 participants