Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle one-element array return value in ScalarFunctionExpr #12965

Merged
merged 1 commit into from
Oct 17, 2024

Conversation

joroKr21
Copy link
Contributor

This was done in #12922 only for math functions.
We now generalize this fallback to all scalar UDFs.

Which issue does this PR close?

Closes #12959

Rationale for this change

See #12922

What changes are included in this PR?

Are these changes tested?

Relying on existing tests.

Are there any user-facing changes?

ColumnarValue::from_args_and_result is removed again, it's not released yet.

// If the function is not volatile and all arguments are scalars,
// we can assume that returning a one-element array is equivalent to returning a scalar.
let preserve_scalar = array.len() == 1
&& self.fun.signature().volatility != Volatility::Volatile
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can we not do this for functions that are Volatile?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A volatile function is supposed to return a different result on every invocation so I don't think it's safe to do that. The example we were discussing in the earlier PR is a hypothetical rand(upper_bound: Int64) -> Int64 function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read the discussion and I see where you coming from. I guess a bit confusing since that function is currently not implementable like you said in that thread.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess a bit confusing since that function is currently not implementable like you said in that thread.

I think so. It's currently not implementable.

I think under the current UDF invoke model, we should preserve scalar even if it is volatile. For example, we have a UDF fn random_string(len: int) -> String, which returns a random string of length len. This function is volatile, but it should return a scalar if the len parameter is a scalar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can make that change - was just leaning more on the safe side.

&& self.fun.signature().volatility != Volatility::Volatile
&& inputs
.iter()
.all(|arg| matches!(arg, ColumnarValue::Scalar(_)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should consider the case where the inputs are empty. For UDFs without args, either return a scalar directly or return an array with num_rows. Trying to convert the output array back to scalar for them seems unnecessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch 👍

This was done in apache#12922 only for math functions.
We now generalize this fallback to all scalar UDFs.
Copy link
Member

@jonahgao jonahgao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM👍, thanks @joroKr21

@alamb alamb merged commit 0ed369e into apache:main Oct 17, 2024
24 checks passed
@alamb
Copy link
Contributor

alamb commented Oct 17, 2024

Thanks @joroKr21 @eejbyfeldt and @jonahgao

@joroKr21 joroKr21 deleted the scalar-udfs branch October 17, 2024 17:29
joroKr21 added a commit to coralogix/arrow-datafusion that referenced this pull request Oct 17, 2024
…2965)

This was done in apache#12922 only for math functions.
We now generalize this fallback to all scalar UDFs.
joroKr21 added a commit to coralogix/arrow-datafusion that referenced this pull request Oct 19, 2024
…2965) (#276)

This was done in apache#12922 only for math functions.
We now generalize this fallback to all scalar UDFs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
functions physical-expr Physical Expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ensure that scalar functions fulfil the ColumnarValue contract
4 participants