-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deprecate make_scalar_function #8878
Conversation
pub fn make_scalar_function<F>(inner: F) -> ScalarFunctionImplementation | ||
/// Note that this function makes a scalar function with no arguments or all scalar inputs return a scalar. | ||
/// That's said its output will be same for all input rows in a batch. | ||
pub(crate) fn make_scalar_function<F>(inner: F) -> ScalarFunctionImplementation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or we should deprecate it and ask user to use ScalarUDFImpl
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a behavior change
, so I add a label for it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense to me, thanks @viirya
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @viirya -- I now understand the issue in #8866 better.
I think make_scalar_function
is quite a nice function that lowered the cognative barrier to making udfs and it would be great if we could keep it. I think @jorgecarleitao or @andygrove originally added it when we introduced ColumnarValue
so that new users didn't have to understand ColumnarValue to begin working on it.
That being said, perhaps encouraging people to use the trait ScalarUDFImpl
is ok,
let ColumnarValue::Array(array) = &args[0] else { | ||
panic!() | ||
}; | ||
Ok(ColumnarValue::Array(Arc::new(array.clone()) as ArrayRef)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we could make this code easier to work with using From
impls, like
Ok(ColumnarValue::Array(Arc::new(array.clone()) as ArrayRef)) | |
Ok(ColumnarValue::from(Arc::new(array.clone()) as ArrayRef)) |
Maybe it doesn't matter 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I added two From
impls for ColumnarValue
.
// in DataFusion, all `args` and output are dynamically-typed arrays, which means that we need to: | ||
// 1. cast the values to the type we want | ||
// 2. perform the computation for every element in the array (using a loop or SIMD) and construct the result | ||
|
||
// this is guaranteed by DataFusion based on the function's signature. | ||
assert_eq!(args.len(), 2); | ||
|
||
let ColumnarValue::Array(arg0) = &args[0] else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the more correct pattern is to create an array with into_array
let ColumnarValue::Array(arg0) = &args[0] else { | |
let arg0 = args[0].into_array(num_rows) |
Though now I wrote that it is not super clear to me where the num_rows should come from 🤔
Also, maybe this is more of the fix for #8866 -- that for a volatile function, it shouldn't be passed a scalar but unstead the array should be expanded prior to invocation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what situations would arg
of pow
be a Scalar? If we can provide an example to reproduce it, our case/test can cover more condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think something like pow(1.0, 2.0)
(pass in literal values)
Co-authored-by: Andrew Lamb <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @viirya --- I think it would be nice to avoid some of the repetition of converting [ColumnarValue]
into [ArrayRef]
by encapsulating it into a function but I also think we can do so as a follow on PR.
I also think we might want to consieer wait for DataFusion 35.0.0 to be released #8863 before merging this PR
let arg0 = as_int32_array(&args[0])?; | ||
let arg1 = as_int32_array(&args[1])?; | ||
let fun = Arc::new(|args: &[ColumnarValue]| { | ||
let len = args |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe as a follow on PR we can make a function that does this length inference and conversion to array (mostly so it can be documented) to make the ideas easier to find 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, have the same thought too. It will be still useful for ScalarUDFImpl
as invoke
takes args: &[ColumnarValue]
and users might need length inference too.
Thanks @alamb. This currently marks |
I am going to merge this PR on the assumption that the Thanks again @viirya |
I took the liberty of merging up from |
Thank you @alamb @jackwener |
The documentation here https://arrow.apache.org/datafusion/library-user-guide/adding-udfs.html probably needs updating too, as it still suggest to use |
Thanks. I will update the document together in a follow up with other stuffs (e.g., #8878 (comment)). |
`make_scalar_function` has been deprecated since v36 [0]. It is being removed from the public api in v43 [1]. [0]: apache/datafusion#8878 [1]: apache/datafusion#12505
* fix: remove use of deprecated `make_scalar_function` `make_scalar_function` has been deprecated since v36 [0]. It is being removed from the public api in v43 [1]. [0]: apache/datafusion#8878 [1]: apache/datafusion#12505 * remove use of `.unwrap()` from pyarrow_function_to_rust
Which issue does this PR close?
Closes #8866.
Rationale for this change
make_scalar_function
is easily to be misused and causing unexpected behavior (e.g., #8866) if users don't understand its underlying logic. Actually this function looks more like a crate private utility function we should not expose.What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?