-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: run expression simplifier in a loop until a fixedpoint or 3 cycles #10358
Conversation
7b3e02d
to
915a45c
Compare
@@ -107,6 +110,7 @@ impl<S: SimplifyInfo> ExprSimplifier<S> { | |||
info, | |||
guarantees: vec![], | |||
canonicalize: true, | |||
max_simplifier_iterations: DEFAULT_MAX_SIMPLIFIER_ITERATIONS, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now there's no reason to have a struct field, but the idea is that we might want to make this configurable later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we could have an API to update it to mirror
pub fn with_canonicalize(mut self, canonicalize: bool) -> Self {
self.canonicalize = canonicalize;
self
}
Something like
pub fn with_max_simplifier_iterations(mut self, max_simplifier_iterations: usize) -> Self {
self.max_simplifier_iterations = max_simplifier_iterations;
self
}
Perhaps
|
||
i += 1; | ||
if !transformed || i >= self.max_simplifier_iterations { | ||
return Ok(expr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning (Expr, usize)
would be nice for testing. Especially if the number of iterations becomes a configurable parameter since that opens up the door for new simplification bugs; we might want to test that specifically X number of iterations runs on a given expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could add a function like simplify_inner
that did and then use that in tests
pub fn simplify(&self, mut expr: Expr) -> Result<Expr> {
self.simplify_inner(expr).0
}
/// Returns the simplified expr and the number of iterations required.
fn simplify_inner(&self, mut expr: Expr) -> Result<(usize, Expr)> {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @erratic-pattern -- this looks really neat.
Closes #1160.
One of the lower numbered tickets we have closed recently
This PR looks awesome to me. Once we get the CI sorted out I think it would be good to go. I left some suggestions on how to improve the tests and I do think we should add some coverage if possible
@@ -107,6 +110,7 @@ impl<S: SimplifyInfo> ExprSimplifier<S> { | |||
info, | |||
guarantees: vec![], | |||
canonicalize: true, | |||
max_simplifier_iterations: DEFAULT_MAX_SIMPLIFIER_ITERATIONS, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we could have an API to update it to mirror
pub fn with_canonicalize(mut self, canonicalize: bool) -> Self {
self.canonicalize = canonicalize;
self
}
Something like
pub fn with_max_simplifier_iterations(mut self, max_simplifier_iterations: usize) -> Self {
self.max_simplifier_iterations = max_simplifier_iterations;
self
}
Perhaps
|
||
i += 1; | ||
if !transformed || i >= self.max_simplifier_iterations { | ||
return Ok(expr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could add a function like simplify_inner
that did and then use that in tests
pub fn simplify(&self, mut expr: Expr) -> Result<Expr> {
self.simplify_inner(expr).0
}
/// Returns the simplified expr and the number of iterations required.
fn simplify_inner(&self, mut expr: Expr) -> Result<(usize, Expr)> {
248a426
to
2cab7d0
Compare
assert_eq!(num_iter, 3); | ||
|
||
// NOTE: this currently does not simplify | ||
// (((c4 - 10) + 10) *100) / 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note that one of @devinjdangelo 's examples did not simplify. I think we would need to rebalance parens in the simplifier or maybe the canonicalizer for that to work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree the simplifier needs to be made more sophisticated to handle some of these cases It might be cool to add a ticket explaining the case
Maybe it is something like (<expr> + CONST) + CONST)
--> <expr> + (CONST + CONST)
(basically apply associative rules
let expected = lit(true); | ||
let (expr, num_iter) = simplify_count(expr); | ||
assert_eq!(expr, expected); | ||
assert_eq!(num_iter, 3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb I've added the other examples from the comments you linked. So far this one is the only one that runs up to 3 iterations.
@@ -323,6 +326,14 @@ impl<S: SimplifyInfo> ExprSimplifier<S> { | |||
self.canonicalize = canonicalize; | |||
self | |||
} | |||
|
|||
pub fn with_max_simplifier_iterations( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs documentation
let Transformed { data, transformed, .. } = expr | ||
.rewrite(&mut const_evaluator)? | ||
.transform_data(|expr| expr.rewrite(&mut simplifier))? | ||
.transform_data(|expr| expr.rewrite(&mut guarantee_rewriter))? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
guarantee_rewriter
and shorten_in_list_simplifier
only ran once in the previous version of this code. Should we continue only running them once here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that would probably be a good idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that would probably be a good idea
I can put shorten_in_list_simplifier
at the end of the loop, since that's where it was previously. I am unsure where guarantee_rewriter
is supposed to go as it previously ran once inbetween the simplifier/constant evaluator passes.
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this PR may actually speed up planning too (as it will avoid re-simplifying expressions that don't need to be simplified.
I am running the benchmarks now and will post results
c6efe00
to
559ad14
Compare
My benchmark runs show this ma help q19 a bit. I'll rerun to be sure
|
I reran benchmarks and this PR certatainly doesn't slow things down and maybe makes it marginally faster
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @erratic-pattern -- this is looking good
Seems like we need:
- Some docs (per https://github.com/apache/datafusion/pull/10358/files#r1588554192)
- Remove the
datafusion-function
dependency
Otherwise this seems like it is almost ready to go
Thank you very much @erratic-pattern and @jayzhan211 -- something @rdettai and I discussed a long time ago is finally happening 🎉
let expected = lit_bool_null(); | ||
let (expr, num_iter) = simplify_count(expr); | ||
assert_eq!(expr, expected); | ||
assert_eq!(num_iter, 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool
Marking as draft so it doesn't show up on the "ready to review list" as I think @erratic-pattern has some additional plans |
559ad14
to
3581bc0
Compare
I future proofed the naming a bit by renaming "iterations" to "cycles", because I can improve the algorithm a bit further to short-circuit mid-cycle and so we might later want to capture the actual "iteration" or "step" count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @erratic-pattern -- I think this PR looks great!
I left some small suggestions for improvements but I think we could do them as follow on PRs (or never)
Prior to merging this PR I think we should file follow on tickets to track the additional potential improvements. I can do this next week if someone doesn't beat me to it.
- Improve the const evaluator / simplifier to report more accurately when it changed expressions
- Improve the "did anything change" detection logic to track how many simplifications are applied rather than iterations
- Simplification of expressions like
((c1 + 5) + 10)
let expr = cast(now(), DataType::Int64) | ||
.lt(cast(to_timestamp(vec![lit(0)]), DataType::Int64) + lit(i64::MAX)); | ||
let expected = lit(true); | ||
test_simplify_with_cycle_count(expr, expected, 3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
@@ -323,6 +337,63 @@ impl<S: SimplifyInfo> ExprSimplifier<S> { | |||
self.canonicalize = canonicalize; | |||
self | |||
} | |||
|
|||
/// Specifies the maximum number of simplification cycles to run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😍
assert_eq!(num_iter, 3); | ||
|
||
// NOTE: this currently does not simplify | ||
// (((c4 - 10) + 10) *100) / 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree the simplifier needs to be made more sophisticated to handle some of these cases It might be cool to add a ticket explaining the case
Maybe it is something like (<expr> + CONST) + CONST)
--> <expr> + (CONST + CONST)
(basically apply associative rules
I've made a new algorithm for this that should in theory reduce the amount of work needed to be done by short-circuiting earlier once there is a consecutive sequence of unchanged expression trees equal to the number of rewriters. However, it ended up being harder than I thought to get actual performance improvements when comparing with local benchmarks. I tried several different approaches, but most were actually slower than the code that I have here in this PR. I did eventually come up with something that could possibly compete with the simpler algorithm in this PR. You can compare the branch here My guess is that the additional overhead of checking each iteration is actual significant when we are only running 3 rewriters. I think we would see bigger improvements in cases where the number of optimization rules is larger. Many of the approaches I tried required dynamic dispatch on the I would be interested in scaling this up to run across all of the |
Co-authored-by: Andrew Lamb <[email protected]>
6525688
to
4479a1d
Compare
Benchmark results #10386 (comment) |
After reading the code in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed a commit to this branch and merged up from main. Once CI passes I intend to merge it in
🚀 |
Which issue does this PR close?
Closes #1160.
Rationale for this change
What changes are included in this PR?
Are these changes tested?
yes
Are there any user-facing changes?
no