Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce FunctionRegistry dependency to optimize and rewrite rule #10714

Merged
merged 6 commits into from
Jun 1, 2024

Conversation

jayzhan211
Copy link
Contributor

Which issue does this PR close?

Closes #10703.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added logical-expr Logical plan and expressions optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels May 29, 2024
@@ -56,7 +56,7 @@ pub mod single_distinct_to_groupby;
pub mod unwrap_cast_in_comparison;
pub mod utils;

#[cfg(test)]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not only integration test but also datafusion-example

@@ -163,53 +173,3 @@ impl OptimizerRule for ReplaceDistinctWithAggregate {
Some(BottomUp)
}
}

#[cfg(test)]
mod tests {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to slt

@jayzhan211 jayzhan211 marked this pull request as ready for review May 29, 2024 15:39
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jayzhan211 -- this looks very cool. THank you for breaking the dependency

Rather than adding a new argument to all optimizer sites, which will cause API churn, what do you think about adding a new (backwards compatible) API to OptimizerConfig?

Perhaps something like

trait OptimizerConfig {
...
  /// return the function registry for looking up functions
  /// returns None by default.
  /// Used for optimizations like `distinct ON` --> first_value
  fn function_registry(&self) -> Option<&dyn FunctionRegistry> {
    None
  } 
}

The only downside I can see to providing the entire registry is that it then adds an implicit dependency on whatever is registered as "first_value" that should probably be documented

A way to make that more explicit might be something like

trait OptimizerConfig {
...
  /// return the function for first_value, if known. This function will be
  /// used to rewrite DISTINCT ON if present
  fn first_value(&self) -> Option<AggregateUDF> {
    None
  } 
}

🤔

let aggr_expr = select_expr
.into_iter()
.map(|e| first_value(vec![e], false, None, sort_expr.clone(), None));
let aggr_expr = select_expr.into_iter().map(|e| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change is the reason for this PR I think -- to avoid the hard coded dependency on first_value...

@@ -143,3 +143,39 @@ LIMIT 3;
-25 15295
45 15673
-72 -11122

# test distinct on
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@jayzhan211
Copy link
Contributor Author

I would prefer the former to avoid adding too many function in optimize config trait

@jayzhan211 jayzhan211 force-pushed the mv-funcregistry-to-common branch from 1414b15 to 62d59a2 Compare May 31, 2024 01:35
@jayzhan211 jayzhan211 marked this pull request as draft May 31, 2024 02:30
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
@alamb
Copy link
Contributor

alamb commented May 31, 2024

Looks very nice 🆗

I think we can also remove the dependency from cargo now as well:

datafusion-functions-aggregate = { workspace = true }

@jayzhan211
Copy link
Contributor Author

Looks very nice 🆗

I think we can also remove the dependency from cargo now as well:

datafusion-functions-aggregate = { workspace = true }

I totally forgot my main purpose of this PR :(

Signed-off-by: jayzhan211 <[email protected]>
@@ -45,7 +45,6 @@ async-trait = { workspace = true }
chrono = { workspace = true }
datafusion-common = { workspace = true, default-features = true }
datafusion-expr = { workspace = true }
datafusion-functions-aggregate = { workspace = true }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

Signed-off-by: jayzhan211 <[email protected]>
@jayzhan211 jayzhan211 marked this pull request as ready for review May 31, 2024 14:45
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jayzhan211

@jayzhan211
Copy link
Contributor Author

Thanks @alamb

@jayzhan211 jayzhan211 merged commit 7638a26 into apache:main Jun 1, 2024
25 checks passed
findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
…pache#10714)

* mv function registry to expr

Signed-off-by: jayzhan211 <[email protected]>

* registry move to config trait

Signed-off-by: jayzhan211 <[email protected]>

* fix test

Signed-off-by: jayzhan211 <[email protected]>

* fix test

Signed-off-by: jayzhan211 <[email protected]>

* rm dependency

Signed-off-by: jayzhan211 <[email protected]>

* fix cli cargo lock

Signed-off-by: jayzhan211 <[email protected]>

---------

Signed-off-by: jayzhan211 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Decouple builtin aggregate function first_value from optimize rule replace_distinct_aggregate
2 participants