Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[vtgate planner] Routing & Merging refactor #12197

Merged
merged 37 commits into from
Feb 15, 2023

Conversation

systay
Copy link
Collaborator

@systay systay commented Feb 1, 2023

Description

This PR refactors how routing of queries is done during query planning.

Why?

The logic for which routes can be merged together is an important and complex part of the query planner.
Making the code easy to understand and talk about is critical to get this correct.

The old Route operator consisted of a set of fields:

type Route struct {
    Source              ops.Operator
    RouteOpCode         engine.Opcode
    Keyspace            *vindexes.Keyspace
    VindexPreds         []*VindexPlusPredicates
    Selected            *VindexOption
    SysTableTableSchema []evalengine.Expr
    SysTableTableName   map[string]evalengine.Expr
    SeenPredicates      []sqlparser.Expr
    TargetDestination   key.Destination
    Alternates          map[*vindexes.Keyspace]*Route
    MergedWith          []*Route
}

Of these, only Source, RouteOpCode and MergedWith are valid for all types of routes.
All other fields only make sense for some OpCodes that the route represents.

The fields VindexPreds and Selected only make sense for sharded tables, which are represented a lot of OpCodes, such as Scatter, EqualUnique, etc.
SysTableTableSchema, SysTableTableName are only used for information_schema tables (OpCode DBA).

In a lot of places, we had to use a switch statement on the OpCode to handle things differently depending on the type of Route we were dealing with.

The Change

To me, this screamed for an interface and multiple different implementation of this interface, depending on which type of route we have.
The new operator now contains:

Route struct {
    Source     ops.Operator
    MergedWith []*Route
    Routing    Routing
}

The Routing interface is then used for picking the best plan per table in the query, and then the merging of multiple Routes into as few as possible.

While doing this refactoring, I tried to keep the tests intact and only change the code behind. For the few exceptions to this rule, I have added comments in this PR explaining why the change was introduced.

Checklist

  • "Backport to:" labels have been added if this change should be back-ported
  • Tests were added or are not required
  • Documentation was added or is not required

@vitess-bot vitess-bot bot added NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsWebsiteDocsUpdate What it says labels Feb 1, 2023
@vitess-bot
Copy link
Contributor

vitess-bot bot commented Feb 1, 2023

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • If this is a change that users need to know about, please apply the release notes (needs details) label so that merging is blocked unless the summary release notes document is included.
  • If a test is added or modified, there should be a documentation on top of the test to explain what the expected behavior is what the test does.

If a new flag is being introduced:

  • Is it really necessary to add this flag?
  • Flag names should be clear and intuitive (as far as possible)
  • Help text should be descriptive.
  • Flag names should use dashes (-) as word separators rather than underscores (_).

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow should be required, the maintainer team should be notified.

Bug fixes

  • There should be at least one unit or end-to-end test.
  • The Pull Request description should include a link to an issue that describes the bug.

Non-trivial changes

  • There should be some code comments as to why things are implemented the way they are.

New/Existing features

  • Should be documented, either by modifying the existing documentation or creating new documentation.
  • New features should have a link to a feature request issue or an RFC that documents the use cases, corner cases and test cases.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • vtctl command output order should be stable and awk-able.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from VTop, if used there.

@systay systay force-pushed the routing-take-3 branch 4 times, most recently from 752194e to b19d35e Compare February 9, 2023 09:40
@systay systay added the Skip CI Skip CI actions from running label Feb 9, 2023
@systay systay force-pushed the routing-take-3 branch 5 times, most recently from 3b61762 to 8f8d98f Compare February 13, 2023 15:50
@systay systay removed NeedsWebsiteDocsUpdate What it says Skip CI Skip CI actions from running labels Feb 13, 2023
@@ -1150,7 +1150,7 @@
"Original": "select Id from user where 1 in ('aa', 'bb')",
"Instructions": {
"OperatorType": "Route",
"Variant": "Scatter",
"Variant": "None",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are now using the evalengine to check if we can at plan-time evaluate expression. If we can, and if the result is false, we can use the None opcode which is very cheap.

@@ -279,9 +279,9 @@
"Sharded": false
},
"FieldQuery": "select RC.CONSTRAINT_NAME, ORDINAL_POSITION from INFORMATION_SCHEMA.KEY_COLUMN_USAGE as KCU, INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS as RC where 1 != 1",
"Query": "select RC.CONSTRAINT_NAME, ORDINAL_POSITION from INFORMATION_SCHEMA.KEY_COLUMN_USAGE as KCU, INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS as RC where KCU.TABLE_SCHEMA = :__vtschemaname and KCU.TABLE_NAME = :KCU_TABLE_NAME and KCU.COLUMN_NAME = 'id' and KCU.REFERENCED_TABLE_SCHEMA = 'test' and KCU.CONSTRAINT_NAME = 'data_type_table_id_fkey' and KCU.CONSTRAINT_NAME = RC.CONSTRAINT_NAME order by KCU.CONSTRAINT_NAME asc, KCU.COLUMN_NAME asc",
"Query": "select RC.CONSTRAINT_NAME, ORDINAL_POSITION from INFORMATION_SCHEMA.KEY_COLUMN_USAGE as KCU, INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS as RC where KCU.TABLE_SCHEMA = :__vtschemaname and KCU.TABLE_NAME = :KCU_TABLE_NAME and KCU.COLUMN_NAME = 'id' and KCU.REFERENCED_TABLE_SCHEMA = :__vtschemaname and KCU.CONSTRAINT_NAME = 'data_type_table_id_fkey' and KCU.CONSTRAINT_NAME = RC.CONSTRAINT_NAME order by KCU.CONSTRAINT_NAME asc, KCU.COLUMN_NAME asc",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old query was wrong - we should never use the given schema name. Instead we have to replace the literal value with the argument :__vtschemaname which is then filled in by the vttablet with the name of the underlying MySQL database.

"SysTableTableName": "[KCU_TABLE_NAME:VARCHAR(\"data_type_table\")]",
"SysTableTableSchema": "[VARCHAR(\"test\")]",
"SysTableTableSchema": "[VARCHAR(\"test\"), VARCHAR(\"test\")]",

This comment was marked as resolved.

"Keyspace": {
"Name": "main",
"Sharded": false
"OperatorType": "Join",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old plan was wrong. Given WHERE kcu.table_schema = ? AND rc.constraint_schema = ?, we won't know until runtime if the user wants to look at information from the same or different keyspaces/schemas, and so merging these routes into a single one is invalid.

@@ -547,7 +571,7 @@
"FieldQuery": "select fk.referenced_table_name as to_table, fk.referenced_column_name as primary_key, fk.column_name as `column`, fk.constraint_name as `name`, rc.update_rule as on_update, rc.delete_rule as on_delete from information_schema.referential_constraints as rc, information_schema.key_column_usage as fk where 1 != 1",
"Query": "select fk.referenced_table_name as to_table, fk.referenced_column_name as primary_key, fk.column_name as `column`, fk.constraint_name as `name`, rc.update_rule as on_update, rc.delete_rule as on_delete from information_schema.referential_constraints as rc, information_schema.key_column_usage as fk where rc.constraint_schema = :__vtschemaname and rc.table_name = :rc_table_name and fk.referenced_column_name is not null and fk.table_schema = :__vtschemaname and fk.table_name = :fk_table_name",
"SysTableTableName": "[fk_table_name:VARCHAR(\"table_name\"), rc_table_name:VARCHAR(\"table_name\")]",
"SysTableTableSchema": "[VARCHAR(\"table_schema\"), VARCHAR(\"table_schema\")]",
"SysTableTableSchema": "[VARCHAR(\"table_schema\")]",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have merged the two routes, there is no need to specify the table_schema schema twice.

"SysTableTableName": "[tc_table_name:VARCHAR(\"table_name\")]",
"SysTableTableSchema": "[VARCHAR(\"constraint_schema\"), VARCHAR(\"table_schema\")]",
"Table": "information_schema.check_constraints, information_schema.table_constraints"
"OperatorType": "Join",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we know that the two schemas being searched for are different - WHERE tc.table_schema = 'table_schema' AND ... cc.constraint_schema = 'constraint_schema'. Merging these two was wrong.

@@ -22,7 +22,7 @@
"Original": "select * from ambiguous_ref_with_source",
"Instructions": {
"OperatorType": "Route",
"Variant": "Reference",
"Variant": "Unsharded",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ambiguous_ref_with_source exists both as an unsharded table in the main keyspace, and as a reference table in the user keyspace. The latter is a copy of the unsharded table spread out to all shards so that all joins can be local.

During route planning we have decided that we want to send the query to the unsharded main keyspace. The OpCode is more accurate if it's Unsharded for this route.

@systay systay removed the NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work label Feb 14, 2023
@systay systay marked this pull request as ready for review February 14, 2023 11:05
Copy link
Member

@frouioui frouioui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, it's an important change that makes things easier for all of us and that fills some gaps we had, thank you @systay 🙏🏻

I left a few questions, comments and nits after this first pass

Comment on lines +41 to +49
type InfoSchemaRouting struct {
SysTableTableSchema []sqlparser.Expr
SysTableTableName map[string]sqlparser.Expr
Table *QueryTable
}

func (isr *InfoSchemaRouting) UpdateRoutingParams(_ *plancontext.PlanningContext, rp *engine.RoutingParameters) error {
rp.SysTableTableSchema = nil
for _, expr := range isr.SysTableTableSchema {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now that we do not merge if SysTableTableSchema is different, should SysTableTableSchema be an Expr than slice of Expr? Similarly, Do we need SysTableTableName as a map?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

valid points. wdyt about doing these fixes in a separate PR? this one has grown enough for now :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a task for it

Copy link
Member

@frouioui frouioui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

Signed-off-by: Andres Taylor <[email protected]>
@systay systay merged commit 5ee12d5 into vitessio:main Feb 15, 2023
@systay systay deleted the routing-take-3 branch February 15, 2023 16:23
dbussink added a commit to planetscale/vitess that referenced this pull request Aug 14, 2023
The rewriting on v16 didn't consider the case where we already had an
extract subquery. In that case we don't extract again, to avoid infinite
recursion.

This does not affect v17 and later as this was fixed in the refactor in
vitessio#12197.

Signed-off-by: Dirkjan Bussink <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants