Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor push_down_filter to fix dead-loop and use optimizer_recurse. #5337

Merged
merged 1 commit into from
Feb 25, 2023

Conversation

jackwener
Copy link
Member

Which issue does this PR close?

Closes #.

Rationale for this change

When I do #4465, I find push_down_filter exist dead-loop.

So I fix this problem, and refactor it.

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the optimizer Optimizer rules label Feb 19, 2023
@HaoYang670
Copy link
Contributor

When I do #4465, I find push_down_filter exist dead-loop.
So I fix this problem, and refactor it.

It could be better to file an issue to explain when the dead_loop would happen, but this is optional.

fn try_optimize(
&self,
plan: &LogicalPlan,
config: &dyn OptimizerConfig,
_config: &dyn OptimizerConfig,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it lose information if we disable config here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it lose information if we disable config here?

I didn't disable it, just because it's only used recursively, so it needs to be prefixed with _.

It's hint by cargo clippy

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just because it's only used recursively

In the original code, we pass the config to the optimize_children, but now, it is just a placeholder for self recursion, which means we never use the config anymore.

@jackwener
Copy link
Member Author

jackwener commented Feb 25, 2023

When I do #4465, I find push_down_filter exist dead-loop.
So I fix this problem, and refactor it.

It could be better to file an issue to explain when the dead_loop would happen, but this is optional.

This is strange thing🥹, I also don't understand why.
When you just run cargo test -p datafusion --test tpcds_planning, it's ok.
But run cargo test, it will dead loop.

@HaoYang670
Copy link
Contributor

When you just run cargo test -p datafusion --test tpcds_planning, it's ok.
But run cargo test, it will dead loop.

We'd better figure out what happens, so that the fix can make sense to every one.

@jackwener
Copy link
Member Author

jackwener commented Feb 25, 2023

When you just run cargo test -p datafusion --test tpcds_planning, it's ok.
But run cargo test, it will dead loop.

We'd better figure out what happens, so that the fix can make sense to every one.

After investigation, it's due to by original code in line -522 to -532.
I think it's strange because it just happend in cargo test. When test alone, it's ok.😂
Because it's exist a risk recursion, this PR remove this recursion.
For this reason, even if this PR is just for refactoring, it makes great sense.

Comment on lines -522 to -532
LogicalPlan::Join(join) => {
let optimized_plan = push_down_join(plan, join, None)?;
return match optimized_plan {
Some(optimized_plan) => Ok(Some(
optimize_children(self, &optimized_plan, config)?
.unwrap_or(optimized_plan),
)),
None => optimize_children(self, plan, config),
};
}
_ => return optimize_children(self, plan, config),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HaoYang670 Here remove a risk recursion.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a nice improvement to me -- thank you @jackwener and @HaoYang670

I agree it is strange that you were seeing a dead loop (is that the same as "infinite recursion") sometimes and not others, but I still think this code represents an improvement

@jackwener jackwener merged commit 0b77ec2 into apache:main Feb 25, 2023
@jackwener jackwener deleted the push_down_filter branch February 25, 2023 20:54
@ursabot
Copy link

ursabot commented Feb 25, 2023

Benchmark runs are scheduled for baseline = 8b92b9b and contender = 0b77ec2. 0b77ec2 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimizer Optimizer rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants