-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce FunctionRegistry dependency to optimize and rewrite rule #10714
Conversation
Signed-off-by: jayzhan211 <[email protected]>
datafusion/optimizer/src/lib.rs
Outdated
@@ -56,7 +56,7 @@ pub mod single_distinct_to_groupby; | |||
pub mod unwrap_cast_in_comparison; | |||
pub mod utils; | |||
|
|||
#[cfg(test)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not only integration test but also datafusion-example
@@ -163,53 +173,3 @@ impl OptimizerRule for ReplaceDistinctWithAggregate { | |||
Some(BottomUp) | |||
} | |||
} | |||
|
|||
#[cfg(test)] | |||
mod tests { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to slt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jayzhan211 -- this looks very cool. THank you for breaking the dependency
Rather than adding a new argument to all optimizer sites, which will cause API churn, what do you think about adding a new (backwards compatible) API to OptimizerConfig
?
Perhaps something like
trait OptimizerConfig {
...
/// return the function registry for looking up functions
/// returns None by default.
/// Used for optimizations like `distinct ON` --> first_value
fn function_registry(&self) -> Option<&dyn FunctionRegistry> {
None
}
}
The only downside I can see to providing the entire registry is that it then adds an implicit dependency on whatever is registered as "first_value" that should probably be documented
A way to make that more explicit might be something like
trait OptimizerConfig {
...
/// return the function for first_value, if known. This function will be
/// used to rewrite DISTINCT ON if present
fn first_value(&self) -> Option<AggregateUDF> {
None
}
}
🤔
let aggr_expr = select_expr | ||
.into_iter() | ||
.map(|e| first_value(vec![e], false, None, sort_expr.clone(), None)); | ||
let aggr_expr = select_expr.into_iter().map(|e| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this change is the reason for this PR I think -- to avoid the hard coded dependency on first_value
...
@@ -143,3 +143,39 @@ LIMIT 3; | |||
-25 15295 | |||
45 15673 | |||
-72 -11122 | |||
|
|||
# test distinct on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
I would prefer the former to avoid adding too many function in optimize config trait |
Signed-off-by: jayzhan211 <[email protected]>
1414b15
to
62d59a2
Compare
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Looks very nice 🆗 I think we can also remove the dependency from cargo now as well:
|
I totally forgot my main purpose of this PR :( |
Signed-off-by: jayzhan211 <[email protected]>
@@ -45,7 +45,6 @@ async-trait = { workspace = true } | |||
chrono = { workspace = true } | |||
datafusion-common = { workspace = true, default-features = true } | |||
datafusion-expr = { workspace = true } | |||
datafusion-functions-aggregate = { workspace = true } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
Signed-off-by: jayzhan211 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @jayzhan211
Thanks @alamb |
…pache#10714) * mv function registry to expr Signed-off-by: jayzhan211 <[email protected]> * registry move to config trait Signed-off-by: jayzhan211 <[email protected]> * fix test Signed-off-by: jayzhan211 <[email protected]> * fix test Signed-off-by: jayzhan211 <[email protected]> * rm dependency Signed-off-by: jayzhan211 <[email protected]> * fix cli cargo lock Signed-off-by: jayzhan211 <[email protected]> --------- Signed-off-by: jayzhan211 <[email protected]>
Which issue does this PR close?
Closes #10703.
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?