-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support FixedSizeList
type coercion
#8902
Conversation
FixedSizeList
type coercion
|
f169af2
to
ac81670
Compare
} | ||
BuiltinScalarFunction::ArrayPrepend => { | ||
Signature::element_and_array(self.volatility()) | ||
} | ||
BuiltinScalarFunction::ArrayRepeat => Signature::any(2, self.volatility()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although the functions including array_repeat
, array_union
, array_intersect
, and array_except
are also two-argument, Signature::array_and_element
could not handle NULL
well for them. The following PR could continue to work on this issue.
BuiltinScalarFunction::ArrayExcept => Signature::any(2, self.volatility()), | ||
BuiltinScalarFunction::Flatten => Signature::any(1, self.volatility()), | ||
BuiltinScalarFunction::ArrayHasAll | ||
| BuiltinScalarFunction::ArrayHasAny | ||
| BuiltinScalarFunction::ArrayHas => Signature::any(2, self.volatility()), | ||
| BuiltinScalarFunction::ArrayHas => { | ||
Signature::array_and_element(self.volatility()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add the example that works after this change.
I did not find one in array.slt
One of the example is array_has(list, null)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
array_has_any
and array_has_any
cannot use array_and_element
, because their signature is array and array.
@@ -311,6 +329,8 @@ fn coerced_from<'a>( | |||
Utf8 | LargeUtf8 => Some(type_into.clone()), | |||
Null if can_cast_types(type_from, type_into) => Some(type_into.clone()), | |||
|
|||
List(_) if matches!(type_from, FixedSizeList(_, _)) => Some(type_into.clone()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will it be easier if we run Fixedsizelist to List like coerce_arguments_for_fun
before this signature type coercion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am unable to decide between coerce_arguments_for_fun
and coerce_arguments_for_signature
. Are there specific benefits for each?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One is coerced based on function, another is based on signature. If we target to convert from fixedsizelist to list, it seems coerce_arguments_for_fun
is more suitable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think we can have fixed size list to list conversion in one place.
Currently, it converts both in here and coerce_arguments_for_fun
, right? Is it possible to have it once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move coerce_arguments_for_fun before signature? Convert FixedSizeList to List before signature coercion. Then we don't need to deal with FixedSizeList
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can.
In ExprSchemale
(https://github.com/apache/arrow-datafusion/blob/a7cdf605a1e23608e51889988c239613c80fb671/datafusion/expr/src/expr_schema.rs#L91-L105)
It forces the function return type consistent, but I am not sure of its purpose. Maybe @2010YOUY01 could explain it more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we dont need this anymore, input types should be handled later on via coerce_arguments_for_signature
. They are doing the same things
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e1d8809
to
2d16386
Compare
/// Specifies Signatures for array functions | ||
ArraySignature(ArrayFunctionSignature), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For future extension
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is a good change, though a change of the API
// Cast Fixedsizelist to List for array functions | ||
if *fun == BuiltinScalarFunction::MakeArray { | ||
// coerce the fixed size list to list for all array fucntions | ||
if fun.name().contains("array") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two problems that I think we should stick to the previous checking.
- name() takes the first aliases of the function, so if there we forgot to have first aliases starts with array, it fails.
- We just need to care about
make_array
only, then other functions should get non-fixedsizelsit as input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we only need to care about make_array
? If we put coerce_arguments_for_fun
in front of coerce_arguments_for_signature
, it should cast all FixedSizeList
to List
regarding array functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because any kind of array functions includes make_array(). Like array_append(cast(fixedsizelist, list), element)
. we dont need to check again with array_append
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but it is difficult for FixedSizeList
type coercion. My idea is to do these staff in coerce_arguments_for_signature
based on different ArraySignature(ArrayFunctionSignature)
, which will benefit future maintenance and development. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I know the reason, it seems we still need to handle fixedsizelist for other array functions because we can pass Fixedsizelist directly to them with arrow_cast
.
In this case, I agree we should handle them in coerce_arguments_for_signature
. And, coerce_arguments_for_fun
can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can keep coerce_arguments_for_fun
until all FixedSizeList
type coercion work is done.
24bc9ea
to
b1d79ba
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@jayzhan211 Thank you for your review |
Thank you @Weijun-H and @jayzhan211 -- I plan to review this tomorrow if possible. I know it has been outstanding for quite a while |
FixedSizeList
type coercionFixedSizeList
type coercion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @Weijun-H -- really nice
Also, thank you @jayzhan211 for the review. It was wonderful to review this PR after you and @Weijun-H have worked on it and find the structure in such good shape
/// Specifies Signatures for array functions | ||
ArraySignature(ArrayFunctionSignature), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is a good change, though a change of the API
@@ -263,6 +283,33 @@ impl Signature { | |||
volatility, | |||
} | |||
} | |||
/// Specialized Signature for ArrayAppend and similar functions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice cleanup
@@ -77,6 +77,19 @@ AS | |||
FROM arrays | |||
; | |||
|
|||
#TODO: create FixedSizeList with NULL column |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we maybe track this TODO with a separate ticket?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tracked by #9094
Co-authored-by: Andrew Lamb <[email protected]>
Thanks again @Weijun-H ! |
Which issue does this PR close?
Parts #6560
Parts #8249
Rationale for this change
When the argument has a FixedSizeList, type coercion is applied to the array function. This pull request only affects the function that uses two arguments, which are
array_element
,array_has
,array_positions
,array_prepend
,array_remove
,array_remove_all
, andarray_has
.What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?