-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UDAF: Add more fields to state fields #10391
Conversation
Signed-off-by: jayzhan211 <[email protected]>
@@ -152,16 +157,21 @@ pub trait AggregateExpr: Send + Sync + Debug + PartialEq<dyn Any> { | |||
pub struct AggregateFunctionExpr { | |||
fun: AggregateUDF, | |||
args: Vec<Arc<dyn PhysicalExpr>>, | |||
/// input type | |||
input_type: DataType, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found most of the builtin function name is input_data_type
so I rename it.
/// | ||
/// [`format_state_name`]: crate::utils::format_state_name | ||
fn state_fields(&self, _args: StateFieldsArgs) -> Result<Vec<Field>> { | ||
not_impl_err!("state_fields hasn't been implemented for {self:?} yet") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found most of the state_fields
have their own version, so return err if not defined now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be an API change -- I think if we want to change it we should also update the documentation to reflect the change
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jayzhan211 -- I almost feel like this PR is a solution in search of a problem. Specifically, is there a practical problem that the current API can't handle that this could ?
If this doesn't add any value to users but costs them (in terms of API churn) I am not sure it makes sense to make this change
@@ -287,18 +286,23 @@ pub trait AggregateUDFImpl: Debug + Send + Sync { | |||
/// aggregate function was called. | |||
fn accumulator(&self, acc_args: AccumulatorArgs) -> Result<Box<dyn Accumulator>>; | |||
|
|||
/// Return the fields for the function | |||
fn field(&self, _args: FieldArgs) -> Result<Field> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the same information as return_type
? If so, perhaps we should deprecate the return_type
function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return_type
is part of the information in field
and state_field
not replaceable to field
/// | ||
/// [`format_state_name`]: crate::utils::format_state_name | ||
fn state_fields(&self, _args: StateFieldsArgs) -> Result<Vec<Field>> { | ||
not_impl_err!("state_fields hasn't been implemented for {self:?} yet") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be an API change -- I think if we want to change it we should also update the documentation to reflect the change
Field::new( | ||
format_state_name(args.name, "mean1"), | ||
DataType::Float64, | ||
args.nullable, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aren't some of these fields nullable even if their input is not nullable (e.g. mean with no inputs is null)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this should be true
pub nullable: bool, | ||
} | ||
|
||
impl<'a> FieldArgs<'a> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given all the fields are pub, I wonder how much benefit this API adds over simply creating FieldArgs directly 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But like name
, it includes arguments, and not every time is the same, how do they get the correct field without args.name
?
@alamb This change is based on what the built-in function does, and what we need for UDAF to have the same behavior. If the change is not clear if it is necessary, we can also change it when moving the function to UDAF gradually. This PR exists because 1. breaking API changes at once, so we don't need to keep breaking API after that. 2. Split the change out can also reduce the review size of other functions moving (but less straightforward if the change is worth it) if we found out there is a better way to rewrite what the current builtin function does, the change here might be useless, so I think both way has pros and cons |
StateFieldsArgs FieldArgs Upd: |
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Ah, interesting, I didn't realize this. Sorry I think it would be useful to port UDFs over one by one, and as we find gaps in the APIs we can adjust them Or if we want to discover all the potential issues we could do a WIP PR (as you have done already) to try to move the aggregates and when we hit a block fix the underlying issues in a different PR |
Which issue does this PR close?
Closes #.
I check the
field
andstate_fields
that are used for most of the aggregate functions. I think it is nice to split them out as a PRRationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?