-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt: fix missing filters after join reordering #76334
Conversation
@DrewKimball We've run into a bug where the join reorderer incorrectly eliminates join filters. Here's a test case that shows the problem: https://gist.github.com/mgartner/ff9857369f61d4d7bb7458d536c57664 Do the changes proposed here make sense? The logic behind them is: if we are reordering inner joins, the vertex sets of the original joins shouldn't affect whether or not the filters in two edges are associative. This seems to solve the bug, without breaking any test cases. I'm curious if you remember any specific example that required these lines of code. cc @andy-kimball since he worked closely with some of this logic too. |
b5c6e4e
to
10c6604
Compare
I can take a closer look at this in a bit, but for now I don't think those checks should be removed. The cockroach/pkg/sql/opt/xform/join_order_builder.go Lines 496 to 499 in c7446c5
|
Thanks for taking a look. We really appreciate your help here!
Doesn't this check occur here:
That's what I thought originally, but it doesn't appear to be the issue. We think the cause might be incorrect conflict rules that have been added to edges. Rebecca modified the Consider This PR is an attempt to eliminate these allegedly bad conflict rules. |
I guess it does, since the referenced columns are encoded in the TES. We shouldn't be adding conflict rules based on which columns are referenced, since that information is already in the SES and TES. So I think this change is correct. Nice job tracking that down! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10c6604
to
ffac034
Compare
This is ready for another look. I added a regression test for the bug, and a unit test that previously failed because an unnecessary conflict rule was created. I also added a commit that adds the SES, TES, and conflict rules of each edge to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
Reviewed 4 of 4 files at r3, 5 of 5 files at r4, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner)
pkg/sql/opt/xform/testdata/rules/join_order, line 887 at r3 (raw file):
B: scan cy C: select ├── scan dz
nit: looks like the formatting is a bit weird here
pkg/sql/opt/xform/testdata/rules/join_order, line 2086 at r3 (raw file):
Edges a2.b = a3.b [inner, ses=CD, tes=CD, rules=()] a3.a = a4.a [inner, ses=DE, tes=DE, rules=()]
It's a bit concerning that it seems we don't have any tests here with non-empty rules. Can you try to add one?
pkg/sql/opt/xform/testdata/rules/join_order, line 2140 at r4 (raw file):
a INT NOT NULL, b INT NOT NULL, PRIMARY KEY (a ASC, b ASC)
nit: remove tabs
This commit updates the output of the `reorderjoins` opt test command to display the initial state of the `JoinOrderBuilder`. It adds additional information to the output including the TES, SES, and conflict rules for each edge. Release note: None
This commit eliminates logic in the `assoc`, `leftAsscom`, and `rightAsscom` functions in the join order builder that aimed to prevent generating "orphaned" predicates, where one or more referenced relations are not in a join's input. In rare cases, this logic had the side effect of creating invalid conflict rules for edges, which could prevent valid predicates from being added to reordered join trees. It is safe to remove these conditionals because they are unnecessary. The CD-C algorithm already prevents generation of orphaned predicates by checking that the total eligibility set (TES) is a subset of a join's input vertices. In our implementation, this is handled by the `checkNonInnerJoin` and `checkInnerJoin` functions. Fixes cockroachdb#76522 Release note (bug fix): A bug has been fixed which caused the query optimizer to omit join filters in rare cases when reordering joins, which could result in incorrect query results. This bug was present since v20.2.
ffac034
to
66d8865
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @rytaft)
pkg/sql/opt/xform/testdata/rules/join_order, line 887 at r3 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: looks like the formatting is a bit weird here
Done.
pkg/sql/opt/xform/testdata/rules/join_order, line 2086 at r3 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
It's a bit concerning that it seems we don't have any tests here with non-empty rules. Can you try to add one?
I've added one test with a conflict rule on an edge. Conflict rules are somewhat uncommon because of some optimizations to reduce them into the TES, which is mentioned in section 5.5 of the paper:
cockroach/pkg/sql/opt/xform/join_order_builder.go
Lines 1191 to 1201 in 24d886c
if rule.from.intersects(e.tes) { | |
// If the 'from' relation set intersects the total eligibility set, simply | |
// add the 'to' set to the TES because the rule will always be triggered. | |
e.tes = e.tes.union(rule.to) | |
return | |
} | |
if rule.to.isSubsetOf(e.tes) { | |
// If the 'to' relation set is a subset of the total eligibility set, the | |
// rule is a do-nothing. | |
return | |
} |
pkg/sql/opt/xform/testdata/rules/join_order, line 2140 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: remove tabs
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! The extra diagnostic code should help with debugging in the future. I previously missed that checkInnerJoin
and checkNonInnerJoin
cover the APPLICABLE
check from the paper (Figure 9: Pseudocode for applicable B/C
). The inner join check looks slightly different though. I don't know if this matters. For example, in the paper, the check is L-TES(◦) ⊆ S1 ∧ R-TES(◦) ⊆ S2
but we do something like TES(◦) ⊆ S1 ∪ S2 ∧ TES(◦) ∩ S1 != ∅ ∧ TES(◦) ∩ S2 != ∅
. The 2nd rule could be true in some cases where the first rule is false. But the paper is dealing more with non-inner joins, so maybe that explains the difference.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @rytaft)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new formatting for reorderjoins
looks great!
The inner join check looks slightly different though.
For context, there's some explanation of that here and here. The gist of it is that we have to change things a little to handle the common case when different filters from an inner join predicate can be reordered differently.
A way to build some intuition for why this works is to imagine the following process: all InnerJoin filters are pulled into a single Select operator above the join tree, then the inner joins are freely reordered. Finally, filter pushdown is used to redistribute the filters independently over the join tree. The code doesn't actually do this, but it achieves the same effect as if it had.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @rytaft)
I'm curious about another divergence from the paper in I think the paper's conditional is less strict - it is true when any tables on the left and right side of the original edge, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation. It makes sense. I guess the additional intersection checks in CheckInnerJoin:e.tes.intersects(s1) && e.tes.intersects(s2)
keep all the filters from getting pushed down one branch of the join plan tree, which might result in excessive cartesian product joins and likely would not be useful to add to the search space.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @rytaft)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice work! (I haven't had a chance to delve deeply into the paper, so I'll let @DrewKimball and/or @msirek comment on your last point)
Reviewed 6 of 6 files at r5, 5 of 5 files at r6, all commit messages.
Reviewable status: complete! 2 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner)
My last question is a curiosity I stumbled upon, and it doesn't necessarily relate to this change. I don't think this PR needs to hold up on its answer, so I'll go ahead and merge this. Thanks for the reviews! bors r+ |
Build succeeded: |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from aac0834 to blathers/backport-release-21.1-76334: POST https://api.github.com/repos/cockroachlabs/cockroach/merges: 403 Resource not accessible by integration [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 21.1.x failed. See errors above. error creating merge commit from aac0834 to blathers/backport-release-21.2-76334: POST https://api.github.com/repos/cockroachlabs/cockroach/merges: 403 Resource not accessible by integration [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 21.2.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
Sorry for the delay - I missed this the first time around. The check we use is exactly equivalent to the one in the paper for reasons I'll explain below*. I believe I made that change to make it more explicit that we are trying to prevent generating cross joins, but should have either stuck to the paper or added a comment to explain the change. @mgartner since you're the one who had to read the code with fresh eyes - do you think it would be better to use the check as in the paper or add a comment? * While it's true that the TES is a subset of the left and right tables, it will always include at least one table from each input because it includes tables referenced by the join filter. If the filter doesn't reference tables from one of the inputs, all tables from that input are added to the TES (link). Because of this handling of degenerate predicates and the requirement that referenced tables are part of S1 and S2, TLDR: The paper's correctness relies on join filters referencing at least one table from each input. We ensure that this is the case by pretending that degenerate predicates reference all tables of the unreferenced side(s) while calculating the SES and TES. |
Thanks for following up! Your explanation makes sense (though it took a while to wrap my head around it all again). I think a comment that explicitly calls out the divergence from the paper and an explanation of why it's valid would be helpful. No need to use the exact check used in the paper. |
opt: add TES, SES, and rules to reorderjoins
This commit updates the output of the
reorderjoins
opt test command todisplay the initial state of the
JoinOrderBuilder
. It adds additionalinformation to the output including the TES, SES, and conflict rules for
each edge.
Release note: None
opt: fix missing filters after join reordering
This commit eliminates logic in the
assoc
,leftAsscom
, andrightAsscom
functions in the join order builder that aimed to preventgenerating "orphaned" predicates, where one or more referenced relations
are not in a join's input. In rare cases, this logic had the side effect
of creating invalid conflict rules for edges, which could prevent valid
predicates from being added to reordered join trees.
It is safe to remove these conditionals because they are unnecessary.
The CD-C algorithm already prevents generation of orphaned predicates by
checking that the total eligibility set (TES) is a subset of a join's
input vertices. In our implementation, this is handled by the
checkNonInnerJoin
andcheckInnerJoin
functions.Fixes #76522
Release note (bug fix): A bug has been fixed which caused the query optimizer
to omit join filters in rare cases when reordering joins, which could
result in incorrect query results. This bug was present since v20.2.