Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-22.1: opt: don't add reordered join with extra filters to original memo group #91654

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
120 changes: 108 additions & 12 deletions pkg/sql/opt/xform/join_order_builder.go
Original file line number Diff line number Diff line change
Expand Up @@ -311,6 +311,14 @@ type JoinOrderBuilder struct {
// once does not exceed the session limit.
joinCount int

// rebuildAllJoins is true when the filters in the original matched join tree
// were not pushed down as far as possible. When this is true, all joins
// except the root join need to be re-built, possibly with additional filters
// pushed down. While technically it is sufficient to only do this for the
// joins that would be changed by a successful push-down, it is simpler to
// handle things this way (and the problem is rare).
rebuildAllJoins bool

onReorderFunc OnReorderFunc

onAddJoinFunc OnAddJoinFunc
Expand Down Expand Up @@ -348,6 +356,12 @@ func (jb *JoinOrderBuilder) Reorder(join memo.RelExpr) {
// the best plan.
jb.ensureClosure(join)

// Ensure that the JoinOrderBuilder will not add reordered joins to the
// original memo groups (apart from the root) in the case when doing so
// would add filters that weren't present in the original joins. See the
// validateEdges comment for more information.
jb.validateEdges()

if jb.onReorderFunc != nil {
// Hook for testing purposes.
jb.callOnReorderFunc(join)
Expand Down Expand Up @@ -460,6 +474,72 @@ func (jb *JoinOrderBuilder) ensureClosure(join memo.RelExpr) {
}
}

// validateEdges checks whether each edge applies to its original join. If any
// do not, normalization rules failed to synthesize and push a filter down as
// far as possible, and it is not valid to add new reordered joins to the
// original memo groups. When this is the case, all joins except for the root
// join need to be removed from the plans map. This prevents cases where a join
// is added to a memo group that isn't logically equivalent.
//
// This is necessary because the JoinOrderBuilder expects each join tree for a
// given set of relations to contain all filters that apply to those relations.
// When a new join is constructed, it doesn't contain "degenerate" filters -
// filters that only refer to one side of the join. So if the original join tree
// had an implicit filter that could have been synthesized and pushed down the
// tree, but wasn't, using the original join group that *should* have that
// filter when building a new join would cause a filter to be dropped.
//
// Take the following (simplified) example of a join tree where filter push-down
// rules have failed:
//
// (xy join ab on true) join uv on x = u and a = u
//
// Here, the JoinOrderBuilder will synthesize an 'x = a' filter that will be
// used to join xy and ab. If it was added to the original group, we would have
// a memo group that looks like this:
//
// group: (xy join ab on true), (xy join ab on x = a)
//
// Later joins that are constructed using this group would expect the 'x = a'
// filter to be present, and would avoid adding redundant filters. Therefore,
// a join tree like the following would be added to the memo.
//
// (xy join ab on true) join uv on x = u
//
// Notice how the 'a = u' filter has been dropped because it would be redundant
// when 'x = u' and 'x = a' are already present. We prevent this from happening
// by not reusing the original memo groups in the case when the JoinOrderBuilder
// is able to synthesize and/or push down filters that weren't in the original
// join tree.
func (jb *JoinOrderBuilder) validateEdges() {
for i := range jb.edges {
if jb.rebuildAllJoins {
break
}
e := &jb.edges[i]
if e.op.joinType == opt.InnerJoinOp {
jb.rebuildAllJoins = !e.checkInnerJoin(e.op.leftVertexes, e.op.rightVertexes)
} else {
jb.rebuildAllJoins = !e.checkNonInnerJoin(e.op.leftVertexes, e.op.rightVertexes)
}
}
if jb.rebuildAllJoins {
for vertexes := range jb.plans {
if vertexes.isSingleton() || vertexes == jb.allVertexes() {
// Do not remove the plan if it is for a base relation (not a join) or
// it is the root join. Adding to the root join group is correct because
// the JoinOrderBuilder will only consider filters that were present
// (even if only implicitly) in the root join tree. It is also necessary
// because the purpose of the JoinOrderBuilder is to add equivalent join
// plans to the root join group - otherwise, any new joins would be
// disconnected from the main query plan.
continue
}
delete(jb.plans, vertexes)
}
}
}

// dpSube carries out the DPSube algorithm (citations: [8] figure 4). All
// disjoint pairs of subsets of base relations are enumerated and checked for
// validity. If valid, the pair of subsets is used along with the edges
Expand Down Expand Up @@ -520,9 +600,10 @@ func (jb *JoinOrderBuilder) addJoins(s1, s2 vertexSet) {
continue
}
if !joinIsRedundant {
// If this edge was originally part of a join between relation sets s1 and
// s2, any other edges that apply will also be part of that original join.
joinIsRedundant = e.joinIsRedundant(s1, s2)
// If this edge was originally part of a join between relation sets s1
// and s2, any other edges that apply will also be part of that original
// join.
joinIsRedundant = jb.joinIsRedundant(e, s1, s2)
}
getEquivFDs(&fds, e.filters)
innerJoinFilters = append(innerJoinFilters, e.filters...)
Expand All @@ -541,7 +622,7 @@ func (jb *JoinOrderBuilder) addJoins(s1, s2 vertexSet) {
// Construct a non-inner join. If any inner join filters also apply to the
// pair of relationSets, construct a select on top of the join with the
// inner join filters.
jb.addJoin(e.op.joinType, s1, s2, e.filters, innerJoinFilters, e.joinIsRedundant(s1, s2))
jb.addJoin(e.op.joinType, s1, s2, e.filters, innerJoinFilters, jb.joinIsRedundant(e, s1, s2))
return
}
if e.checkNonInnerJoin(s2, s1) {
Expand All @@ -567,7 +648,7 @@ func (jb *JoinOrderBuilder) addJoins(s1, s2 vertexSet) {
// 010 on the right. 101 is larger than 111 / 2, so we will not enumerate
// this plan unless we consider a join with s2 on the left and s1 on the
// right.
jb.addJoin(e.op.joinType, s2, s1, e.filters, innerJoinFilters, e.joinIsRedundant(s2, s1))
jb.addJoin(e.op.joinType, s2, s1, e.filters, innerJoinFilters, jb.joinIsRedundant(e, s2, s1))
return
}
}
Expand Down Expand Up @@ -634,6 +715,19 @@ func (jb *JoinOrderBuilder) makeTransitiveEdge(col1, col2 opt.ColumnID) {
return
}

originalJoin, ok := jb.plans[op.leftVertexes.union(op.rightVertexes)]
if !ok {
panic(errors.AssertionFailedf("failed to find expected join plan"))
}
if !originalJoin.Relational().FuncDeps.AreColsEquiv(col1, col2) {
// This inferred filter was not pushed down as far as possible. All joins
// apart from the root will have to be rebuilt. We have to do this check
// here because we set the op for this edge to the join to which the filter
// *would* have been pushed down if it existed, so the applicable check will
// always succeed for that join.
jb.rebuildAllJoins = true
}

// Construct the edge.
var1 := jb.f.ConstructVariable(col1)
var2 := jb.f.ConstructVariable(col2)
Expand Down Expand Up @@ -898,6 +992,15 @@ func (jb *JoinOrderBuilder) addBaseRelation(rel memo.RelExpr) {
jb.plans[relSet] = rel
}

// joinIsRedundant returns true if a join between the two sets of base relations
// was already present in the original join tree. If so, enumerating this join
// would be redundant, so it should be skipped.
func (jb *JoinOrderBuilder) joinIsRedundant(e *edge, s1, s2 vertexSet) bool {
// The join is never redundant when rebuildAllJoins is true, because
// rebuildAllJoins indicates we don't want to reuse the original joins.
return !jb.rebuildAllJoins && e.op.leftVertexes == s1 && e.op.rightVertexes == s2
}

// checkSize panics if the number of relations is greater than or equal to
// MaxReorderJoinsLimit. checkSize should be called before a vertex is added to
// the join graph.
Expand Down Expand Up @@ -1327,13 +1430,6 @@ func (e *edge) checkRules(s1, s2 vertexSet) bool {
return true
}

// joinIsRedundant returns true if a join between the two sets of base relations
// was already present in the original join tree. If so, enumerating this join
// would be redundant, so it should be skipped.
func (e *edge) joinIsRedundant(s1, s2 vertexSet) bool {
return e.op.leftVertexes == s1 && e.op.rightVertexes == s2
}

// commute returns true if the given join operator type is commutable.
func commute(op opt.Operator) bool {
return op == opt.InnerJoinOp || op == opt.FullJoinOp
Expand Down
78 changes: 78 additions & 0 deletions pkg/sql/opt/xform/testdata/rules/join_order
Original file line number Diff line number Diff line change
Expand Up @@ -2610,3 +2610,81 @@ project
│ └── t2.a:5 = 123456 [outer=(5), constraints=(/5: [/123456 - /123456]; tight), fd=()-->(5)]
└── filters
└── t1.a:1 = 123456 [outer=(1), constraints=(/1: [/123456 - /123456]; tight), fd=()-->(1)]

# Regression test for #88659 - don't add reordered joins to existing groups when
# filters haven't been pushed down. The c:3 = c:9 filter shouldn't be dropped.
exec-ddl
CREATE TABLE t88659 (
a INT PRIMARY KEY,
b INT NOT NULL,
c DECIMAL,
INDEX idx (b DESC),
UNIQUE INDEX uniq ((b + a) ASC) STORING (b),
FAMILY (a, b)
);
----

exec-ddl
ALTER TABLE t88659 INJECT STATISTICS '[
{
"columns": ["b"],
"created_at": "2000-01-01 00:00:00+00:00",
"distinct_count": 999999999,
"name": "__auto__",
"null_count": 0,
"row_count": 999999999999}
]':::JSONB;
----

opt set=testing_optimizer_random_seed=2758112374651167630 set=testing_optimizer_cost_perturbation=1.0
SELECT *
FROM t88659 AS t0
JOIN t88659 AS t2 ON (t0.b) = (t2.a)
JOIN t88659 AS t3 ON (t2.c) = (t3.c) AND (t0.c) = (t3.c)
JOIN t88659 AS t4 ON (t3.a) = (t4.b) AND (t2.b) = (t4.a) AND (t2.a) = (t4.a);
----
inner-join (lookup t88659)
├── columns: a:1!null b:2!null c:3!null a:7!null b:8!null c:9!null a:13!null b:14!null c:15!null a:19!null b:20!null c:21
├── key columns: [20] = [13]
├── lookup columns are key
├── immutable
├── key: (1)
├── fd: (1)-->(2,3), (7)-->(9), (7)==(2,8,19), (8)==(2,7,19), (2)==(7,8,19), (13)-->(14,15), (3)==(9,15), (9)==(3,15), (15)==(3,9), (19)-->(20,21), (13)==(20), (20)==(13), (19)==(2,7,8)
├── inner-join (lookup t88659)
│ ├── columns: a:1!null b:2!null c:3!null a:7!null b:8!null c:9!null a:19!null b:20!null c:21
│ ├── key columns: [7] = [19]
│ ├── lookup columns are key
│ ├── immutable
│ ├── key: (1)
│ ├── fd: (1)-->(2,3), (7)-->(9), (7)==(2,8,19), (8)==(2,7,19), (19)-->(20,21), (19)==(2,7,8), (2)==(7,8,19), (3)==(9), (9)==(3)
│ ├── inner-join (lookup t88659)
│ │ ├── columns: a:1!null b:2!null c:3!null a:7!null b:8!null c:9!null
│ │ ├── key columns: [1] = [1]
│ │ ├── lookup columns are key
│ │ ├── immutable
│ │ ├── key: (1)
│ │ ├── fd: (1)-->(2,3), (7)-->(9), (7)==(2,8), (8)==(2,7), (2)==(7,8), (3)==(9), (9)==(3)
│ │ ├── inner-join (lookup t88659@idx)
│ │ │ ├── columns: a:1!null b:2!null a:7!null b:8!null c:9
│ │ │ ├── key columns: [7] = [2]
│ │ │ ├── key: (1)
│ │ │ ├── fd: (7)-->(9), (7)==(2,8), (8)==(2,7), (1)-->(2), (2)==(7,8)
│ │ │ ├── select
│ │ │ │ ├── columns: a:7!null b:8!null c:9
│ │ │ │ ├── key: (7)
│ │ │ │ ├── fd: (7)-->(9), (7)==(8), (8)==(7)
│ │ │ │ ├── scan t88659
│ │ │ │ │ ├── columns: a:7!null b:8!null c:9
│ │ │ │ │ ├── computed column expressions
│ │ │ │ │ │ └── crdb_internal_idx_expr:12
│ │ │ │ │ │ └── b:8 + a:7
│ │ │ │ │ ├── key: (7)
│ │ │ │ │ └── fd: (7)-->(8,9)
│ │ │ │ └── filters
│ │ │ │ └── a:7 = b:8 [outer=(7,8), constraints=(/7: (/NULL - ]; /8: (/NULL - ]), fd=(7)==(8), (8)==(7)]
│ │ │ └── filters (true)
│ │ └── filters
│ │ └── c:3 = c:9 [outer=(3,9), immutable, constraints=(/3: (/NULL - ]; /9: (/NULL - ]), fd=(3)==(9), (9)==(3)]
│ └── filters (true)
└── filters
└── c:9 = c:15 [outer=(9,15), immutable, constraints=(/9: (/NULL - ]; /15: (/NULL - ]), fd=(9)==(15), (15)==(9)]