-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Online DDL: support migration cut-over backoff and forced cut-over #14546
Online DDL: support migration cut-over backoff and forced cut-over #14546
Conversation
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
…irst invocation Signed-off-by: Shlomi Noach <[email protected]>
…actions holding a lock on a table Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
…illQueriesOnTable: kill queries on a table, and kill connections with transactions holding locks on table Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
…r and Online DDL executor Signed-off-by: Shlomi Noach <[email protected]>
…and the effect on 'force_cutover' column' Signed-off-by: Shlomi Noach <[email protected]>
…ransaction holding lock Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
…t, ForceCutOverSchemaMigrationResponse Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Documentation PR: vitessio/website#1641 |
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
…tess into onlineddl-cutover-backoff Signed-off-by: Shlomi Noach <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! My only major concern is that we have no tests for the KILL portion, which is particularly sensitive. Am I missing them? It would be ideally covered by unit tests with a lot of test cases with mock mysql results. But we probably don't have the framework in place for that so it may be difficult. Let me know what you think.
require.NotNil(t, rs) | ||
for _, row := range rs.Named().Rows { | ||
message := row.AsString("message", "") | ||
if strings.Contains(message, messageSubstring) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not to make it case insensitive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why we should? If the test knows what to expect, then it should expect the exact string.
@@ -673,6 +673,37 @@ func (s *VtctldServer) CleanupSchemaMigration(ctx context.Context, req *vtctldat | |||
return resp, nil | |||
} | |||
|
|||
// ForceCutOverSchemaMigration is part of the vtctlservicepb.VtctldServer interface. | |||
func (s *VtctldServer) ForceCutOverSchemaMigration(ctx context.Context, req *vtctldatapb.ForceCutOverSchemaMigrationRequest) (resp *vtctldatapb.ForceCutOverSchemaMigrationResponse, err error) { | |||
span, ctx := trace.NewSpan(ctx, "VtctldServer.ForceCutOverSchemaMigrationResponse") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should just be the function name, not the response part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow, can you please explain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The trace span name should be the function name:
span, ctx := trace.NewSpan(ctx, "VtctldServer.ForceCutOverSchemaMigration")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the last thing and it's minor. So I will approve and you can change this whenever you like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
go/vt/vttablet/onlineddl/executor.go
Outdated
@@ -765,8 +766,93 @@ func (e *Executor) terminateVReplMigration(ctx context.Context, uuid string) err | |||
return nil | |||
} | |||
|
|||
func (e *Executor) killQueriesOnTable(ctx context.Context, tableName string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't really covered by any tests, is it? This feels like the most critical aspect as we're killing things off in the production system.
Things like this are actually easier in unit tests as you can mock the mysql query responses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is covered in https://github.com/vitessio/vitess/pull/14546/files#diff-e1236744a601a269891381b0ed09b7151d86f5a9b001d8c4e954211c2ae04a8d ; there's a test that holds an open transaction, we attempt a completion, the migration does not complete; we then force cut-over, and we see that the migration completes.
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
This is covered in https://github.com/vitessio/vitess/pull/14546/files#diff-e1236744a601a269891381b0ed09b7151d86f5a9b001d8c4e954211c2ae04a8d ; there's a test that holds an open transaction, we attempt a completion, the migration does not complete; we then force cut-over, and we see that the migration completes. |
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Co-authored-by: Matt Lord <[email protected]> Signed-off-by: Shlomi Noach <[email protected]>
@@ -673,6 +673,37 @@ func (s *VtctldServer) CleanupSchemaMigration(ctx context.Context, req *vtctldat | |||
return resp, nil | |||
} | |||
|
|||
// ForceCutOverSchemaMigration is part of the vtctlservicepb.VtctldServer interface. | |||
func (s *VtctldServer) ForceCutOverSchemaMigration(ctx context.Context, req *vtctldatapb.ForceCutOverSchemaMigrationRequest) (resp *vtctldatapb.ForceCutOverSchemaMigrationResponse, err error) { | |||
span, ctx := trace.NewSpan(ctx, "VtctldServer.ForceCutOverSchemaMigrationResponse") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mattlord is saying this, essentially
span, ctx := trace.NewSpan(ctx, "VtctldServer.ForceCutOverSchemaMigrationResponse") | |
span, ctx := trace.NewSpan(ctx, "VtctldServer.ForceCutOverSchemaMigration") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed!
return nil, err | ||
} | ||
|
||
log.Info("Calling ApplySchema to force cut-over migration") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we include the UUID in this log?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
proto/vtctlservice.proto
Outdated
@@ -66,6 +66,8 @@ service Vtctld { | |||
rpc CleanupSchemaMigration(vtctldata.CleanupSchemaMigrationRequest) returns (vtctldata.CleanupSchemaMigrationResponse) {}; | |||
// CompleteSchemaMigration completes one or all migrations executed with --postpone-completion. | |||
rpc CompleteSchemaMigration(vtctldata.CompleteSchemaMigrationRequest) returns (vtctldata.CompleteSchemaMigrationResponse) {}; | |||
// ForceCutOverSchemaMigration marks a schema migration for forced cut-over. | |||
rpc ForceCutOverSchemaMigration(vtctldata.ForceCutOverSchemaMigrationRequest) returns (vtctldata.ForceCutOverSchemaMigrationResponse) {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: move this down below FindAllShardsInKeyspace
(to keep them tidy)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
proto/vtctldata.proto
Outdated
@@ -459,6 +458,15 @@ message CompleteSchemaMigrationResponse { | |||
map<string, uint64> rows_affected_by_shard = 1; | |||
} | |||
|
|||
message ForceCutOverSchemaMigrationRequest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: move these down below FindAllShardsInKeyspaceRequest/Response
(to keep them tidy)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
…itessio#14546) Signed-off-by: Shlomi Noach <[email protected]> Co-authored-by: Matt Lord <[email protected]>
Description
Closes #14530
This PR implements two functionalities relating to schema migration cut-over, relevant to
ALTER TABLE
invitess
strategy only. Both related to a scenario where the cut-over times out due to either excessive load on the migrated table, or due to some lock being placed on the table. The two functionalities are:1min
has passed; the next one not before additional5min
have passed, the next is10min
,30min
and from that point cut-overs are only attempted at30min
intervals. This is to avoid a scenario where the database, that is already is under heavy load, needs to cope with frequently recurring cut-over attempts, which themselves put additional locks on tables.Backoff
The backoff mechanism is implemented as-is, and is not configurable.
Force cut-over
Forced cut-over can be controlled in these ways.
--force-cut-over-after DDL strategy flag
Example:
It's possible to preconfigure the maximum duration where we allow cut-overs to fail/timeout due to pending queries/transactions.
--force-cut-over-after
, if nonzero, applies starting the first cut-over attempt.In the above example,
--force-cut-over-after
is set to1
hour. The migration may run for as long as it needs, say 5 hours. Starting the first cut-over attempt, the clock starts ticking 1 hour. The cut-over may be successful, in which case all's well and nothing further happens. Or it may fail, in which case the backoff mechanism kicks in. The next attempt is done within1m
, then5m
, etc. But if these all keep failing, then1h
since the very first failed attempt, irrespective of backoff, and within a 1min resolution, the scheduler runs a cut-over with query&transaction termination. This is highly likely to succeed. But if it fails, then it continues to attempt forced cut-overs every minute.ALTER VITESS_MIGRATION ... FORCE_CUTOVER ...
We introduce a new syntax:
ALTER VITESS_MIGRATION '9748c3b7_7fdb_11eb_ac2c_f875a4d24e90' FORCE_CUTOVER; ALTER VITESS_MIGRATION FORCE_CUTOVER ALL;
The former forces cut-over for a specific migration, the latter for all pending migrations (
queued
,ready
,running
). All these command do is set the newschema_migrations.force_cutover
column value to1
, much likeALTER VITESS_MIGRATION ... COMPLETE ...
.The scheduler picks up this
force_cutover
column value on its next review of running migrations. If it's1
, then any backoff state is ignored, and the scheduler attempts a forced cut-over, terminating queries and transactions.vtctldclient OnlineDDL force-cutover
These matching options are added to
vtctldclient OnlineDDL
command:Notes
Only works on MySQL
8.0
(requiresperformance_schema.data_locks
table.5.7
does not provide reliable information).To note the obvious, a forced cut-over is only relevant when the migration is actually eligible for cut-over. If the migration is incomplete, then it won't attempt a cut-over. Issuing a
ALTER VITESS_MIGRATION ... FORCE_CUTOVER ...
will set the column to1
, but it will only become relevant when the migration becomes ready to cut-over. Also, if the migration runs with--postpone-completion
, then it will not be eligible to cut-over ; the user will first need to issue aALTER VITESS_MIGRATION ... COMPLETE ...
. It's fine if the user runs aALTER VITESS_MIGRATION ... FORCE_CUTOVER ...
first, but this will only have the effect of settingforce_cutover=1
; it will notCOMPLETE
the migration.Added unit and endtoend tests.
Documentation: vitessio/website#1641.
Related Issue(s)
Checklist
Deployment Notes