Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Planner refactoring #7103

Merged
merged 42 commits into from
Jan 11, 2021
Merged

Planner refactoring #7103

merged 42 commits into from
Jan 11, 2021

Conversation

systay
Copy link
Collaborator

@systay systay commented Dec 3, 2020

This is a larger rewrite of the vtgate planner. It introduces new passes and intermediate representations of the query.

The old code used these passes over the query:

Pass Struct transformation
Parsing String -> AST
Rewriting (normalization) AST -> AST
Planning AST -> logicalPlan (builder)
WireUp logicalPlan -> engine.Primitive

This refactored planner now uses the following passes:

Pass Struct transformation
Parsing String -> AST
Rewriting (normalization) AST -> AST
Semantic Analysis AST -> AST"
Extract Query Graph AST" -> QueryGraph
Route Planning QueryGraph -> joinTree
Horizon Planning joinTree -> logicalPlan
WireUp logicalPlan -> engine.Primitive

By splitting the planning process into smaller pieces, each part can be simplified and extended to do more.

Here follows a short description of each new pass.

Semantic Analysis

Responsibilities: Scoping, Binding

Walks the AST and does scoping and binding, so whenever a column name is found, the planner has information about which tables is being referenced. Tables are given a TableSet identifier - a bitmask struct that allows the planner to quickly find what dependencies every expression has.

Extract Query Graph

Responsibilities: Extract Subqueries, Create Query Graph

The query graph is an intermediate representation that is designed to allow the route planner to quickly consider many different solutions for the query. Instead of keeping the query in the AST, which is limited by the tree structure it has, we produce a graphy representation with all used tables (nodes) in one list, and edges between them in a separate list.

In this pass, subqueries are extracted into a list of queries and the relationships between them. This makes it easier for later passes to plan fully without having to switch back and forth between passes - when doing route planning, we can do all of route planning in one go and don't have to wait for SELECT expressions to be considered before planning subqueries used in SELECT expressions.

Route planning

Responsibilities: Plan how to route the query - plan FROM and WHERE

This pass uses dynamic programming to consider all combinations of tables in order to find the optimal plan. Optimal here means minimal number of route primitives in the plan.

At the end of this stage, we have a tree structure that represents all the route primitives needed and how they should be joined.

Horizon planning

Responsibilities: Plan projections, aggregations, grouping and ordering

Once we have a plan for how to route queries, we plan what projections we need from each route, and how to do ORDER BY/GROUP BY/LIMIT et al.

Positive outcomes from this refactoring.

Why do this non-trivial piece of work?

We still have a number of query types that are not supported. In order to be able to support more queries, we needed to extend the planner. Instead of adding to the legacy planner which is not very easy to work with, we felt that it was time to introduce this new design, which not only will allow us to support these queries, it also sets us up to be able to do more optimisations in the future.

@systay systay changed the title Horizon planning Planner refactoring Dec 21, 2020
@systay systay force-pushed the horizon-planning branch 2 times, most recently from 85b6065 to fc17b74 Compare December 21, 2020 10:20
systay and others added 8 commits December 21, 2020 20:31
Signed-off-by: Andres Taylor <[email protected]>
Signed-off-by: GuptaManan100 <[email protected]>
Test the new planner in plan_test side by side with the old planner

Signed-off-by: Andres Taylor <[email protected]>
@systay systay marked this pull request as ready for review December 22, 2020 14:49
systay and others added 8 commits December 27, 2020 12:38
Signed-off-by: Andres Taylor <[email protected]>
Signed-off-by: Andres Taylor <[email protected]>
Signed-off-by: Andres Taylor <[email protected]>
Signed-off-by: Andres Taylor <[email protected]>
Signed-off-by: Andres Taylor <[email protected]>
Signed-off-by: Andres Taylor <[email protected]>
Signed-off-by: Harshit Gangal <[email protected]>
@systay systay requested a review from sougou as a code owner December 29, 2020 13:57
This was referenced Jan 9, 2021
Copy link
Member

@harshit-gangal harshit-gangal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me.

@@ -51,6 +51,7 @@ var (
SQLSelectLimit = SystemVariable{Name: "sql_select_limit", Default: off}
TransactionMode = SystemVariable{Name: "transaction_mode", IdentifierAsString: true}
Workload = SystemVariable{Name: "workload", IdentifierAsString: true}
PlannerVersion = SystemVariable{Name: "planner_version", IdentifierAsString: true}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to allow users to change planner_version at session level? If we do this than the plan cache also would be required to store this information. Allowing at vtgate startup should be good enough.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. I'll remove it. I was testing things and wanted to quickly be able to switch, but there are other ways of accomplishing this.


scopes []*scope
exprDeps map[sqlparser.Expr]TableSet
si schemaInformation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this is not used in the code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, this is how we will read from the vschema to get tables we have column information about. I'll remove

}

// resolveUnQualifiedColumn
func (a *analyzer) resolveUnQualifiedColumn(current *scope, expr *sqlparser.ColName) (table, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expr is not used

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above - once we can query about column info, this is where this would happen. I'll remove

}
)

// TableSetFor returns the bitmask for this particular tableshoe
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: tableshoe looks incorrect.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Covfefe?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intentional. I see

Comment on lines 52 to 54
// Wireup2 does the wire up work for the new planner
Wireup2(semTable *semantics.SemTable) error

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: wireup2 -> wireupv4

Comment on lines 110 to 124
if err := qg.collectTables(table.Exprs, semTable); err != nil {
return err
}
}
return nil
}

func (qg *queryGraph) collectTables(t sqlparser.TableExprs, semTable *semantics.SemTable) error {
for _, expr := range t {
if err := qg.collectTable(expr, semTable); err != nil {
return err
}
}
return nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this could be done inline.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

collectTables is also used from line 69 in this same file, so I just used it here as well. DRY, right?

Comment on lines +102 to +107
for _, predicate := range splitAndExpression(nil, table.Condition.On) {
err := qg.collectPredicate(predicate, semTable)
if err != nil {
return err
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can use collectPredicates method call here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, not really. collectPredicates extracts predicates from a SELECT struct, and that is not what we have here

Comment on lines 68 to 69
// solvedTables keeps track of which tables this route is covering
solvedTables semantics.TableSet
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could be renamed to containedTables

Comment on lines +233 to +240
if len(node.SelectExprs) == 0 {
node.SelectExprs = []sqlparser.SelectExpr{
&sqlparser.AliasedExpr{
Expr: sqlparser.NewIntLiteral([]byte{'1'}),
},
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sometimes we don't need anything from a route except the number of matching rows. in those cases, we add a single literal because a SELECT with no expressions is not valid

Comment on lines 278 to 286
for i, table := range qg.tables {
solves := semTable.TableSetFor(table.alias)
plan, err := createRoutePlan(table, solves, vschema)
if err != nil {
return nil, err
}
plans[i] = plan
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: similar to in lefttoright, can be moved to a method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants