Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: fix missing filters after join reordering #76334

Merged
merged 2 commits into from
Feb 15, 2022

Conversation

mgartner
Copy link
Collaborator

@mgartner mgartner commented Feb 9, 2022

opt: add TES, SES, and rules to reorderjoins

This commit updates the output of the reorderjoins opt test command to
display the initial state of the JoinOrderBuilder. It adds additional
information to the output including the TES, SES, and conflict rules for
each edge.

Release note: None

opt: fix missing filters after join reordering

This commit eliminates logic in the assoc, leftAsscom, and
rightAsscom functions in the join order builder that aimed to prevent
generating "orphaned" predicates, where one or more referenced relations
are not in a join's input. In rare cases, this logic had the side effect
of creating invalid conflict rules for edges, which could prevent valid
predicates from being added to reordered join trees.

It is safe to remove these conditionals because they are unnecessary.
The CD-C algorithm already prevents generation of orphaned predicates by
checking that the total eligibility set (TES) is a subset of a join's
input vertices. In our implementation, this is handled by the
checkNonInnerJoin and checkInnerJoin functions.

Fixes #76522

Release note (bug fix): A bug has been fixed which caused the query optimizer
to omit join filters in rare cases when reordering joins, which could
result in incorrect query results. This bug was present since v20.2.

@mgartner mgartner requested a review from a team as a code owner February 9, 2022 21:28
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@mgartner
Copy link
Collaborator Author

mgartner commented Feb 9, 2022

@DrewKimball We've run into a bug where the join reorderer incorrectly eliminates join filters. Here's a test case that shows the problem: https://gist.github.com/mgartner/ff9857369f61d4d7bb7458d536c57664

Do the changes proposed here make sense? The logic behind them is: if we are reordering inner joins, the vertex sets of the original joins shouldn't affect whether or not the filters in two edges are associative.

This seems to solve the bug, without breaking any test cases. I'm curious if you remember any specific example that required these lines of code.

cc @andy-kimball since he worked closely with some of this logic too.

@DrewKimball
Copy link
Collaborator

I can take a closer look at this in a bit, but for now I don't think those checks should be removed. The checkProperty function only checks assoc, leftAsscom etc. based on the join types, not the columns referenced by the filters. We still have to avoid pushing filters down below the columns they filter on. I'm suspecting the code that prevents adding redundant filters to the join:

if areFiltersRedundant(&fds, e.filters) {
// Avoid adding redundant filters.
continue
}

@mgartner
Copy link
Collaborator Author

mgartner commented Feb 9, 2022

Thanks for taking a look. We really appreciate your help here!

We still have to avoid pushing filters down below the columns they filter on.

Doesn't this check occur here:

if e.checkInnerJoin(s1, s2) {


I'm suspecting the code that prevents adding redundant filters to the join:

That's what I thought originally, but it doesn't appear to be the issue. We think the cause might be incorrect conflict rules that have been added to edges. Rebecca modified the reorderjoins test output to include the rules. The rules here seem incorrect: https://gist.github.com/mgartner/9401df94ca26124ac6b5d915da026632#file-join_reorder-txt-L273-L275

Considert5.b = t1.b [D -> C] [inner]. This filter can be evaluated with just A and E, so the D -> C rule doesn’t seem valid.

This PR is an attempt to eliminate these allegedly bad conflict rules.

@DrewKimball
Copy link
Collaborator

Doesn't this check occur here

I guess it does, since the referenced columns are encoded in the TES. We shouldn't be adding conflict rules based on which columns are referenced, since that information is already in the SES and TES. So I think this change is correct. Nice job tracking that down!

Copy link
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

(once that test case is added)

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained

@mgartner mgartner requested a review from a team February 14, 2022 21:13
@mgartner
Copy link
Collaborator Author

This is ready for another look. I added a regression test for the bug, and a unit test that previously failed because an unnecessary conflict rule was created. I also added a commit that adds the SES, TES, and conflict rules of each edge to the reorderjoins test output.

@mgartner mgartner changed the title WIP: opt: fix join reordering opt: fix missing filters after join reordering Feb 14, 2022
Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Reviewed 4 of 4 files at r3, 5 of 5 files at r4, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner)


pkg/sql/opt/xform/testdata/rules/join_order, line 887 at r3 (raw file):

  B: scan cy
  C: select
   ├── scan dz

nit: looks like the formatting is a bit weird here


pkg/sql/opt/xform/testdata/rules/join_order, line 2086 at r3 (raw file):

Edges
  a2.b = a3.b [inner, ses=CD, tes=CD, rules=()]
  a3.a = a4.a [inner, ses=DE, tes=DE, rules=()]

It's a bit concerning that it seems we don't have any tests here with non-empty rules. Can you try to add one?


pkg/sql/opt/xform/testdata/rules/join_order, line 2140 at r4 (raw file):

	a INT NOT NULL,
	b INT NOT NULL,
	PRIMARY KEY (a ASC, b ASC)

nit: remove tabs

This commit updates the output of the `reorderjoins` opt test command to
display the initial state of the `JoinOrderBuilder`. It adds additional
information to the output including the TES, SES, and conflict rules for
each edge.

Release note: None
This commit eliminates logic in the `assoc`, `leftAsscom`, and
`rightAsscom` functions in the join order builder that aimed to prevent
generating "orphaned" predicates, where one or more referenced relations
are not in a join's input. In rare cases, this logic had the side effect
of creating invalid conflict rules for edges, which could prevent valid
predicates from being added to reordered join trees.

It is safe to remove these conditionals because they are unnecessary.
The CD-C algorithm already prevents generation of orphaned predicates by
checking that the total eligibility set (TES) is a subset of a join's
input vertices. In our implementation, this is handled by the
`checkNonInnerJoin` and `checkInnerJoin` functions.

Fixes cockroachdb#76522

Release note (bug fix): A bug has been fixed which caused the query optimizer
to omit join filters in rare cases when reordering joins, which could
result in incorrect query results. This bug was present since v20.2.
Copy link
Collaborator Author

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @rytaft)


pkg/sql/opt/xform/testdata/rules/join_order, line 887 at r3 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

nit: looks like the formatting is a bit weird here

Done.


pkg/sql/opt/xform/testdata/rules/join_order, line 2086 at r3 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

It's a bit concerning that it seems we don't have any tests here with non-empty rules. Can you try to add one?

I've added one test with a conflict rule on an edge. Conflict rules are somewhat uncommon because of some optimizations to reduce them into the TES, which is mentioned in section 5.5 of the paper:

if rule.from.intersects(e.tes) {
// If the 'from' relation set intersects the total eligibility set, simply
// add the 'to' set to the TES because the rule will always be triggered.
e.tes = e.tes.union(rule.to)
return
}
if rule.to.isSubsetOf(e.tes) {
// If the 'to' relation set is a subset of the total eligibility set, the
// rule is a do-nothing.
return
}


pkg/sql/opt/xform/testdata/rules/join_order, line 2140 at r4 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

nit: remove tabs

Done.

Copy link
Contributor

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! The extra diagnostic code should help with debugging in the future. I previously missed that checkInnerJoin and checkNonInnerJoin cover the APPLICABLEcheck from the paper (Figure 9: Pseudocode for applicable B/C). The inner join check looks slightly different though. I don't know if this matters. For example, in the paper, the check is L-TES(◦) ⊆ S1 ∧ R-TES(◦) ⊆ S2 but we do something like TES(◦) ⊆ S1 ∪ S2 ∧ TES(◦) ∩ S1 != ∅ ∧ TES(◦) ∩ S2 != ∅. The 2nd rule could be true in some cases where the first rule is false. But the paper is dealing more with non-inner joins, so maybe that explains the difference.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @rytaft)

Copy link
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new formatting for reorderjoins looks great!

The inner join check looks slightly different though.

For context, there's some explanation of that here and here. The gist of it is that we have to change things a little to handle the common case when different filters from an inner join predicate can be reordered differently.

A way to build some intuition for why this works is to imagine the following process: all InnerJoin filters are pulled into a single Select operator above the join tree, then the inner joins are freely reordered. Finally, filter pushdown is used to redistribute the filters independently over the join tree. The code doesn't actually do this, but it achieves the same effect as if it had.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @rytaft)

@mgartner
Copy link
Collaborator Author

I'm curious about another divergence from the paper in checkNonInnerJoin. Here we check TES(◦) ∩ S1 != ∅ ∧ TES(◦) ∩ S2 != ∅ to avoid creating new cross joins, which I believe is based on section 6.2 in the paper. However, the paper states that 𝑇(left(◦)) ∩ S1 != ∅ ∧ 𝑇(right(◦)) ∩ S2 != ∅ should be used.

I think the paper's conditional is less strict - it is true when any tables on the left and right side of the original edge, 𝑇(left(◦)) and 𝑇(right(◦)), intersect with the vertex sets S1 and S2, respectively. Our conditional is true only when tables in the egde's TES intersect with each vertex set, which is more strict because the TES is a subset of 𝑇(left(◦)) ∪ 𝑇(right(◦)). Does this difference help prevent generating cross joins in more cases?

Copy link
Contributor

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. It makes sense. I guess the additional intersection checks in CheckInnerJoin:e.tes.intersects(s1) && e.tes.intersects(s2) keep all the filters from getting pushed down one branch of the join plan tree, which might result in excessive cartesian product joins and likely would not be useful to add to the search space.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @rytaft)

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: nice work! (I haven't had a chance to delve deeply into the paper, so I'll let @DrewKimball and/or @msirek comment on your last point)

Reviewed 6 of 6 files at r5, 5 of 5 files at r6, all commit messages.
Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner)

@mgartner
Copy link
Collaborator Author

My last question is a curiosity I stumbled upon, and it doesn't necessarily relate to this change. I don't think this PR needs to hold up on its answer, so I'll go ahead and merge this.

Thanks for the reviews!

bors r+

@craig
Copy link
Contributor

craig bot commented Feb 15, 2022

Build succeeded:

@craig craig bot merged commit 9132eac into cockroachdb:master Feb 15, 2022
@blathers-crl
Copy link

blathers-crl bot commented Feb 15, 2022

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from aac0834 to blathers/backport-release-21.1-76334: POST https://api.github.com/repos/cockroachlabs/cockroach/merges: 403 Resource not accessible by integration []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 21.1.x failed. See errors above.


error creating merge commit from aac0834 to blathers/backport-release-21.2-76334: POST https://api.github.com/repos/cockroachlabs/cockroach/merges: 403 Resource not accessible by integration []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 21.2.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@DrewKimball
Copy link
Collaborator

I'm curious about another divergence from the paper in checkNonInnerJoin. Here we check TES(◦) ∩ S1 != ∅ ∧ TES(◦) ∩ S2 != ∅ to avoid creating new cross joins, which I believe is based on section 6.2 in the paper. However, the paper states that 𝑇(left(◦)) ∩ S1 != ∅ ∧ 𝑇(right(◦)) ∩ S2 != ∅ should be used.

I think the paper's conditional is less strict - it is true when any tables on the left and right side of the original edge, 𝑇(left(◦)) and 𝑇(right(◦)), intersect with the vertex sets S1 and S2, respectively. Our conditional is true only when tables in the egde's TES intersect with each vertex set, which is more strict because the TES is a subset of 𝑇(left(◦)) ∪ 𝑇(right(◦)). Does this difference help prevent generating cross joins in more cases?

Sorry for the delay - I missed this the first time around. The check we use is exactly equivalent to the one in the paper for reasons I'll explain below*. I believe I made that change to make it more explicit that we are trying to prevent generating cross joins, but should have either stuck to the paper or added a comment to explain the change.

@mgartner since you're the one who had to read the code with fresh eyes - do you think it would be better to use the check as in the paper or add a comment?

* While it's true that the TES is a subset of the left and right tables, it will always include at least one table from each input because it includes tables referenced by the join filter. If the filter doesn't reference tables from one of the inputs, all tables from that input are added to the TES (link). Because of this handling of degenerate predicates and the requirement that referenced tables are part of S1 and S2, T(input) ∩ S != ∅ is always equivalent to TES(◦) ∩ S != ∅.

TLDR: The paper's correctness relies on join filters referencing at least one table from each input. We ensure that this is the case by pretending that degenerate predicates reference all tables of the unreferenced side(s) while calculating the SES and TES.

@mgartner
Copy link
Collaborator Author

Thanks for following up! Your explanation makes sense (though it took a while to wrap my head around it all again). I think a comment that explicitly calls out the divergence from the paper and an explanation of why it's valid would be helpful. No need to use the exact check used in the paper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

opt: join reordering can produce incorrect query plans
5 participants