-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add casting of count
to Int64
in array_repeat
function to ensure consistent integer type handling
#14236
Add casting of count
to Int64
in array_repeat
function to ensure consistent integer type handling
#14236
Conversation
…e consistent integer type handling
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @jatin510, LGTM.
I've left some comments on more possible enhancements here.
_ => return exec_err!("count must be an integer type"), | ||
}; | ||
|
||
let count_array = as_int64_array(&count_array)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe uint
should be used as a target type here? Right now DF panics when negative value passed as an argument, which also could be fixed along with type casting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we want the target type of array_repeat
count function to be UInt64
, we have to change the behaviour of array_repeat
. As of now, by default count
is of type Int64
.
It will be a breaking change, are we okay with that ?
with current implementation of array_repeat
and default
data type of Int64
.
It is behaving like this , if we are converting all Signed Integer
type to Unsigned Integer
, but letting the input of count
to be of Int64
> select array_repeat(1,-20000);
+--------------------------------------+
| array_repeat(Int64(1),Int64(-20000)) |
+--------------------------------------+
| [] |
+--------------------------------------+
In Spark, it gives empty arrow for negative count
argument
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By target type I've meant conversion to UInt internallly instead of Int, so it wouldn't affect any input arguments, only prevent this function from panicking on negative values (but I thought it would return cast error 🤔 ), so I don't think it's a breaking change.
let count_array = &args[1]; | ||
|
||
let count_array = match count_array.data_type() { | ||
DataType::Int8 | DataType::Int16 | DataType::Int32 => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likely, UInt should also be allowed here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thansk for working on this @jatin510 and @korowa -- another way to implement this same functionality would using the signature
(in which case DataFusion will handle the casting for you)
It seems like the current method simply accepts any argument types
fn signature(&self) -> &Signature { |
I think you could potentially use a signature
https://docs.rs/datafusion/latest/datafusion/logical_expr/struct.Signature.html
Or maybe you could implement https://docs.rs/datafusion/latest/datafusion/logical_expr/trait.ScalarUDFImpl.html#method.coerce_types directly
(and simply say the second argument must be an Int64 or UInt64)
🤔
It makes sense. |
|
||
// Coerce the second argument to Int64/UInt64 if it's a numeric type | ||
let second = match count_type { | ||
DataType::Int8 | DataType::Int16 | DataType::Int32 | DataType::Int64 => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-necessary suggestion: due to safe cast converting negative values to nulls, perhaps we should also add a simple test to verify that count array with multiple values with nulls (after casting to uint) will be processed as expected, like
select array_repeat('x', column1) from (values (-1), (2), (-3));
Which issue does this PR close?
Closes #14228.
Rationale for this change
The count in
array_repeat
function only works withInt64
data type. Now, we have upcasted the count variable if the dataype isInt8
,Int16
andInt32
.What changes are included in this PR?
Updated the implementation of
array_repeat
to upcast datatype ofcount
argument.Are these changes tested?
Yes, Added test in
array.slt
file.Are there any user-facing changes?
No