-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize PushDownFilter to avoid recreating schema columns #11211
Conversation
@@ -285,16 +321,7 @@ fn extract_or_clauses_for_join<'a>( | |||
filters: &'a [Expr], | |||
schema: &'a DFSchema, | |||
) -> impl Iterator<Item = Expr> + 'a { | |||
let schema_columns = schema |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code was replicated, so I moved it into its own function
for predicate in predicates { | ||
if left_preserved && can_pushdown_join_predicate(&predicate, left_schema)? { | ||
if left_preserved && checker.is_left_only(&predicate) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point of this PR is that ever call to can_pushdown_join_predicate
re-created the same HashSet<Column>
for the left or right schema
Now the creation is done one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! I checked that this refactoring has consistent behavior with previous.
Thank you for the review @jonahgao |
Which issue does this PR close?
Follow on to #11203
Rationale for this change
While reviewing #11203 I noticed that a
HashSet<Column>
(which copies a string for each column in theDFSchema
) was re-created for:What changes are included in this PR?
Create the HashSet once per join input rather than once per predicate per join input
Are these changes tested?
Functionally: Covered by existing CI
Performance: minor reported improvements (1-2% maybe)
Details
Are there any user-facing changes?
Hopefully faster planning