Planner refactoring #7103

systay · 2020-12-03T07:31:36Z

This is a larger rewrite of the vtgate planner. It introduces new passes and intermediate representations of the query.

The old code used these passes over the query:

Pass	Struct transformation
Parsing	String -> AST
Rewriting (normalization)	AST -> AST
Planning	AST -> logicalPlan (builder)
WireUp	logicalPlan -> engine.Primitive

This refactored planner now uses the following passes:

Pass	Struct transformation
Parsing	String -> AST
Rewriting (normalization)	AST -> AST
Semantic Analysis	AST -> AST"
Extract Query Graph	AST" -> QueryGraph
Route Planning	QueryGraph -> joinTree
Horizon Planning	joinTree -> logicalPlan
WireUp	logicalPlan -> engine.Primitive

By splitting the planning process into smaller pieces, each part can be simplified and extended to do more.

Here follows a short description of each new pass.

Semantic Analysis

Responsibilities: Scoping, Binding

Walks the AST and does scoping and binding, so whenever a column name is found, the planner has information about which tables is being referenced. Tables are given a TableSet identifier - a bitmask struct that allows the planner to quickly find what dependencies every expression has.

Extract Query Graph

Responsibilities: Extract Subqueries, Create Query Graph

The query graph is an intermediate representation that is designed to allow the route planner to quickly consider many different solutions for the query. Instead of keeping the query in the AST, which is limited by the tree structure it has, we produce a graphy representation with all used tables (nodes) in one list, and edges between them in a separate list.

In this pass, subqueries are extracted into a list of queries and the relationships between them. This makes it easier for later passes to plan fully without having to switch back and forth between passes - when doing route planning, we can do all of route planning in one go and don't have to wait for SELECT expressions to be considered before planning subqueries used in SELECT expressions.

Route planning

Responsibilities: Plan how to route the query - plan FROM and WHERE

This pass uses dynamic programming to consider all combinations of tables in order to find the optimal plan. Optimal here means minimal number of route primitives in the plan.

At the end of this stage, we have a tree structure that represents all the route primitives needed and how they should be joined.

Horizon planning

Responsibilities: Plan projections, aggregations, grouping and ordering

Once we have a plan for how to route queries, we plan what projections we need from each route, and how to do ORDER BY/GROUP BY/LIMIT et al.

Positive outcomes from this refactoring.

Why do this non-trivial piece of work?

We still have a number of query types that are not supported. In order to be able to support more queries, we needed to extend the planner. Instead of adding to the legacy planner which is not very easy to work with, we felt that it was time to introduce this new design, which not only will allow us to support these queries, it also sets us up to be able to do more optimisations in the future.

Signed-off-by: Andres Taylor <[email protected]>

Signed-off-by: GuptaManan100 <[email protected]>

Test the new planner in plan_test side by side with the old planner Signed-off-by: Andres Taylor <[email protected]>

Signed-off-by: Andres Taylor <[email protected]>

Signed-off-by: GuptaManan100 <[email protected]>

Signed-off-by: Andres Taylor <[email protected]>

Signed-off-by: Harshit Gangal <[email protected]>

Signed-off-by: Andres Taylor <[email protected]>

Signed-off-by: GuptaManan100 <[email protected]>

Signed-off-by: Andres Taylor <[email protected]>

… as the v3 Signed-off-by: Andres Taylor <[email protected]>

Signed-off-by: Andres Taylor <[email protected]>

… is a single route Signed-off-by: Andres Taylor <[email protected]>

Signed-off-by: Andres Taylor <[email protected]>

Signed-off-by: Shlomi Noach <[email protected]>

Signed-off-by: Andres Taylor <[email protected]>

harshit-gangal

Overall looks good to me.

harshit-gangal · 2021-01-08T10:18:11Z

go/vt/sysvars/sysvars.go

@@ -51,6 +51,7 @@ var (
 	SQLSelectLimit      = SystemVariable{Name: "sql_select_limit", Default: off}
 	TransactionMode     = SystemVariable{Name: "transaction_mode", IdentifierAsString: true}
 	Workload            = SystemVariable{Name: "workload", IdentifierAsString: true}
+	PlannerVersion      = SystemVariable{Name: "planner_version", IdentifierAsString: true}


Do we need to allow users to change planner_version at session level? If we do this than the plan cache also would be required to store this information. Allowing at vtgate startup should be good enough.

Fair enough. I'll remove it. I was testing things and wanted to quickly be able to switch, but there are other ways of accomplishing this.

harshit-gangal · 2021-01-10T17:12:50Z

go/vt/vtgate/semantics/analyzer.go

+
+		scopes   []*scope
+		exprDeps map[sqlparser.Expr]TableSet
+		si       schemaInformation


looks like this is not used in the code.

yeah, this is how we will read from the vschema to get tables we have column information about. I'll remove

harshit-gangal · 2021-01-11T06:17:46Z

go/vt/vtgate/semantics/analyzer.go

+}
+
+// resolveUnQualifiedColumn
+func (a *analyzer) resolveUnQualifiedColumn(current *scope, expr *sqlparser.ColName) (table, error) {


expr is not used

same as above - once we can query about column info, this is where this would happen. I'll remove

harshit-gangal · 2021-01-11T06:38:26Z

go/vt/vtgate/semantics/semantic_state.go

+	}
+)
+
+// TableSetFor returns the bitmask for this particular tableshoe


nit: tableshoe looks incorrect.

intentional. I see

harshit-gangal · 2021-01-11T06:48:21Z

go/vt/vtgate/planbuilder/logical_plan.go

+	// Wireup2 does the wire up work for the new planner
+	Wireup2(semTable *semantics.SemTable) error
+


nit: wireup2 -> wireupv4

harshit-gangal · 2021-01-11T07:22:28Z

go/vt/vtgate/planbuilder/querygraph.go

+		if err := qg.collectTables(table.Exprs, semTable); err != nil {
+			return err
+		}
+	}
+	return nil
+}
+
+func (qg *queryGraph) collectTables(t sqlparser.TableExprs, semTable *semantics.SemTable) error {
+	for _, expr := range t {
+		if err := qg.collectTable(expr, semTable); err != nil {
+			return err
+		}
+	}
+	return nil
+}


nit: this could be done inline.

collectTables is also used from line 69 in this same file, so I just used it here as well. DRY, right?

harshit-gangal · 2021-01-11T07:39:10Z

go/vt/vtgate/planbuilder/querygraph.go

+			for _, predicate := range splitAndExpression(nil, table.Condition.On) {
+				err := qg.collectPredicate(predicate, semTable)
+				if err != nil {
+					return err
+				}
+			}


can use collectPredicates method call here?

no, not really. collectPredicates extracts predicates from a SELECT struct, and that is not what we have here

harshit-gangal · 2021-01-11T07:54:11Z

go/vt/vtgate/planbuilder/route.go

+	// solvedTables keeps track of which tables this route is covering
+	solvedTables semantics.TableSet


nit: could be renamed to containedTables

harshit-gangal · 2021-01-11T08:42:26Z

go/vt/vtgate/planbuilder/route.go

+			if len(node.SelectExprs) == 0 {
+				node.SelectExprs = []sqlparser.SelectExpr{
+					&sqlparser.AliasedExpr{
+						Expr: sqlparser.NewIntLiteral([]byte{'1'}),
+					},
+				}
+			}
+		}


why is this needed?

sometimes we don't need anything from a route except the number of matching rows. in those cases, we add a single literal because a SELECT with no expressions is not valid

harshit-gangal · 2021-01-11T08:46:23Z

go/vt/vtgate/planbuilder/route_planning.go

+	for i, table := range qg.tables {
+		solves := semTable.TableSetFor(table.alias)
+		plan, err := createRoutePlan(table, solves, vschema)
+		if err != nil {
+			return nil, err
+		}
+		plans[i] = plan
+	}
+


nit: similar to in lefttoright, can be moved to a method.

Signed-off-by: Andres Taylor <[email protected]>

systay force-pushed the horizon-planning branch from 137943e to f2afbe5 Compare December 3, 2020 07:40

systay force-pushed the horizon-planning branch from 8c463b7 to e45969c Compare December 15, 2020 09:02

systay changed the title ~~Horizon planning~~ Planner refactoring Dec 21, 2020

systay force-pushed the horizon-planning branch 2 times, most recently from 85b6065 to fc17b74 Compare December 21, 2020 10:20

systay and others added 8 commits December 21, 2020 20:31

Semantic analysis

a924e24

Signed-off-by: Andres Taylor <[email protected]>

Added tests for query graph

7f4c826

Signed-off-by: GuptaManan100 <[email protected]>

Added 2nd planner test capability to plan_test

ed8cfe7

Signed-off-by: GuptaManan100 <[email protected]>

Test New Planner

13095ef

Test the new planner in plan_test side by side with the old planner Signed-off-by: Andres Taylor <[email protected]>

added the first two succesful new planner tests

91c8c95

Signed-off-by: Andres Taylor <[email protected]>

checked all the tests that work with the 2nd planner

bdf65b2

Signed-off-by: GuptaManan100 <[email protected]>

support more cases with the new planner

8a148e7

Signed-off-by: Andres Taylor <[email protected]>

new planner can solve the simplest join plans

09a6dac

Signed-off-by: Andres Taylor <[email protected]>

systay force-pushed the horizon-planning branch from fc17b74 to 09a6dac Compare December 22, 2020 14:46

systay marked this pull request as ready for review December 22, 2020 14:49

systay requested a review from harshit-gangal as a code owner December 22, 2020 14:49

systay force-pushed the horizon-planning branch from 019ea65 to b5d0f0f Compare December 22, 2020 17:05

remove randomness from the planbuilding process

9691541

Signed-off-by: Andres Taylor <[email protected]>

systay force-pushed the horizon-planning branch from b5d0f0f to 9691541 Compare December 22, 2020 17:07

systay added 2 commits December 23, 2020 13:39

fix issue with the dp table

713ea58

Signed-off-by: Andres Taylor <[email protected]>

add greedy option for large queries

44de903

Signed-off-by: Andres Taylor <[email protected]>

systay force-pushed the horizon-planning branch from 8f861b0 to 44de903 Compare December 23, 2020 20:58

systay and others added 8 commits December 27, 2020 12:38

refactor: extract method

14f4680

Signed-off-by: Andres Taylor <[email protected]>

refactor: querygraph and test

9224b3a

Signed-off-by: Andres Taylor <[email protected]>

moved code to where it belongs

9aa2331

Signed-off-by: Andres Taylor <[email protected]>

simplify routePlan

e500d3a

Signed-off-by: Andres Taylor <[email protected]>

added route planning unit tests

63d4339

Signed-off-by: Andres Taylor <[email protected]>

added flag to control the planner version

b1a5df8

Signed-off-by: Andres Taylor <[email protected]>

added left to right planner

4ecd166

Signed-off-by: Andres Taylor <[email protected]>

add planner benchmark

33f5ca5

Signed-off-by: Harshit Gangal <[email protected]>

added vtgate flag and system variable to control the planner used

c7aecd3

Signed-off-by: Andres Taylor <[email protected]>

systay requested a review from sougou as a code owner December 29, 2020 13:57

systay added 3 commits December 29, 2020 16:50

added helpful comments

b4ac6de

Signed-off-by: Andres Taylor <[email protected]>

add a shortcut to the greedy planner to prefer joins with predicates

cf1ad11

Signed-off-by: Andres Taylor <[email protected]>

change planner benchmark to only read the input file once

000426e

Signed-off-by: Andres Taylor <[email protected]>

systay force-pushed the horizon-planning branch from e17943c to 000426e Compare December 29, 2020 18:33

added a new version of greedy optimizer using priority queue

1c9c5eb

Signed-off-by: GuptaManan100 <[email protected]>

GuptaManan100 force-pushed the horizon-planning branch from 043f79d to 1c9c5eb Compare December 30, 2020 08:21

GuptaManan100 and others added 13 commits December 30, 2020 15:02

fixed join predicate collection issue

9c16d63

Signed-off-by: GuptaManan100 <[email protected]>

added more supported queries

23d6535

Signed-off-by: Andres Taylor <[email protected]>

handle null comparisons in the V4 planner

57e14ca

Signed-off-by: Andres Taylor <[email protected]>

fail plan tests if the v4 planner unexpectedly produces the same plan…

4b83703

… as the v3 Signed-off-by: Andres Taylor <[email protected]>

merge SelectEqualUnique plans

b9fa57e

Signed-off-by: Andres Taylor <[email protected]>

refactored route planning code

ecd4936

Signed-off-by: Andres Taylor <[email protected]>

Remove the assumption that A join B has the same cost as B join A

3bafc10

Signed-off-by: Andres Taylor <[email protected]>

don't copy table qualifier and only copy some fields if the full plan…

2d4dd72

… is a single route Signed-off-by: Andres Taylor <[email protected]>

keep tables in FROM according to original query

ff4adae

Signed-off-by: Andres Taylor <[email protected]>

Merge remote-tracking branch 'upstream/master' into horizon-planning

7f3c956

Signed-off-by: Andres Taylor <[email protected]>

fix lint on go/vt/srvtopo/resilient_server_test.go

027e154

Signed-off-by: Shlomi Noach <[email protected]>

imports

01a1ef1

Signed-off-by: Andres Taylor <[email protected]>

cleaned out code

5dc6497

Signed-off-by: Andres Taylor <[email protected]>

This was referenced Jan 9, 2021

Gen4 Planner: AxB vs BxA #7274

Merged

Gen4 Tracking #7280

Closed

harshit-gangal approved these changes Jan 11, 2021

View reviewed changes

systay added 3 commits January 11, 2021 15:20

removed planner-version sysvar

d285bfe

Signed-off-by: Andres Taylor <[email protected]>

adress peer review comments

26b008b

Signed-off-by: Andres Taylor <[email protected]>

Merge remote-tracking branch 'upstream/master' into horizon-planning

35e41cd

Signed-off-by: Andres Taylor <[email protected]>

systay merged commit 7b72908 into vitessio:master Jan 11, 2021

eseokoh mentioned this pull request May 31, 2021

Release Note of v9.0.0 includes changes of v10.0 #8212

Closed

ajm188 mentioned this pull request Jul 12, 2021

slack vitess v10.pre tinyspeck/vitess#228

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Planner refactoring #7103

Planner refactoring #7103

systay commented Dec 3, 2020 •

edited

Loading

harshit-gangal left a comment

harshit-gangal Jan 8, 2021

systay Jan 11, 2021

harshit-gangal Jan 10, 2021

systay Jan 11, 2021

harshit-gangal Jan 11, 2021

systay Jan 11, 2021

harshit-gangal Jan 11, 2021

systay Jan 11, 2021

harshit-gangal Jan 11, 2021

harshit-gangal Jan 11, 2021

harshit-gangal Jan 11, 2021

systay Jan 11, 2021

harshit-gangal Jan 11, 2021

systay Jan 11, 2021

harshit-gangal Jan 11, 2021

harshit-gangal Jan 11, 2021

systay Jan 11, 2021

harshit-gangal Jan 11, 2021

		// Wireup2 does the wire up work for the new planner
		Wireup2(semTable *semantics.SemTable) error

		// solvedTables keeps track of which tables this route is covering
		solvedTables semantics.TableSet

Planner refactoring #7103

Planner refactoring #7103

Conversation

systay commented Dec 3, 2020 • edited Loading

Semantic Analysis

Extract Query Graph

Route planning

Horizon planning

Positive outcomes from this refactoring.

harshit-gangal left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

systay commented Dec 3, 2020 •

edited

Loading