Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrent migrations #9192

Merged

Conversation

shlomi-noach
Copy link
Contributor

@shlomi-noach shlomi-noach commented Nov 11, 2021

See updated #9192 (comment)

Description

Ongoing work towards supporting concurrent REVERT migrations.

Up till now, Vitess would only ever run a single migration at a time, for a given keyspace.

  • This isn't 100% correct, as a tablet failure could end up resurrecting an already vreplication stream.

Anyway, we wish to support the following:

  • REVERT migrations, which are by nature high priority migrations, should be able to run concurrently to any other migration
  • We will require an additional flag in ddl_strategy, something like -allow-concurrent or similar
  • There can be an unlimited number of concurrent REVERT migrations with this flag
  • But still only one single "normal" migration at a time.

This changes the scheduler logic, which up till now assumed that if any migration was running, then no need to schedule the next one. From now on, it will need to consider these scenarios:

  • if a migration is queued, it can be made ready if nothing else is running
  • if a "allowed concurrent" migration is queued, it can be made ready is no other migration operates on the same table
  • if any migration is queued, it can be made ready if only other running migrations are "allowed concurrent" and none of them operates on same table

This means a more programmatic approach to scheduling next migration (where today some of the logic is a simple SQL)

Related Issue(s)

Checklist

  • Should this PR be backported?
  • Tests were added or are not required
  • Documentation was added or is not required

@shlomi-noach shlomi-noach added Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: Query Serving release notes (needs details) This PR needs to be listed in the release notes in a dedicated section (deprecation notice, etc...) labels Nov 11, 2021
@shlomi-noach shlomi-noach self-assigned this Nov 11, 2021
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
@shlomi-noach
Copy link
Contributor Author

While #9171 is still unmerged, this is the diff for this branch: planetscale/vitess@online-ddl-postpone-completion...planetscale:online-ddl-concurrent-revert-migrations

Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
…s() before runNextMigration() can actually run a new migration

Signed-off-by: Shlomi Noach <[email protected]>
…s() before runNextMigration() can actually run a new migration

Signed-off-by: Shlomi Noach <[email protected]>
…e reason this is required is that there could be a migration in 'ready' state, which conflicts with some running migration. But we still want to be able to run _other_ -allow-concurrent migrations that do not conflict with running migrations

Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
@shlomi-noach shlomi-noach changed the title WIP: concurrent REVERT migrations WIP: concurrent migrations Nov 15, 2021
@shlomi-noach
Copy link
Contributor Author

shlomi-noach commented Nov 15, 2021

This PR has evolved a bit. We now support concurrent migrations as follows:

  • A migration needs to opt in to run concurrently via -allow-concurrent flag in ddl_strategy
  • A migration is scheduled and executed concurrently if:
    • It specifies -allow-concurrent
    • Is CREATE, or DROP, or REVERT (REVERT works for CREATE, DROP and for online/vrplication ALTER migrations)
    • There is no other running migration operating on same table.

With the new scheduler:

  • There can be as many concurrent migrations running at any given time
  • And there can be at most one non-concurrent ("regular") migration running at any time
  • And never two migrations operating on the same table

Also:

  • There can be multiple migrations in ready state (previously only one)
  • A migration can be "blocked" from running, if it conflicts with a running migration (operating on same table, or non-concurrent)
  • While some ready migrations can be blocked, new migrations are still able to proceed to ready state an then to running, i.e. actually execute.

Just some possible scenarios:

  • If all migrations are non-concurrent, i.e. are regular, then they can only run sequentially.
  • If a regular migration is running, it's possible for a -allow-concurrent migration to also kick in and run
    • and then it's also possible for yet another N -allow-concurrent migrations to kick in and run
  • If one or more -allow-concurrent migrations are running, it is possible for a regular migration to kick in and run

Signed-off-by: Shlomi Noach <[email protected]>
@shlomi-noach shlomi-noach changed the title WIP: concurrent migrations Concurrent migrations Nov 16, 2021
@shlomi-noach shlomi-noach marked this pull request as ready for review November 21, 2021 15:09
@shlomi-noach
Copy link
Contributor Author

This is now ready to review, given #9171 is merged.

@@ -2179,45 +2235,67 @@ func (e *Executor) executeMigration(ctx context.Context, onlineDDL *schema.Onlin
return nil
}

// runNextMigration picks one 'ready' migration that is able to run, and executes it.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is at the heart of the PR: the scheduler here gets to decide whether there's a migration thta can be executed, concurrently or not concurrently, based on existence or non-existence of running migrations.

Copy link
Member

@deepthi deepthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feedback is all on code comments.
May merge after fixing them.

Comment on lines 378 to 380
// isAnyNonConcurrentMigrationRunning sees if there's any migration running right now
// that does not have -allow-concurrent.
// such a running migration will for example prevent a new non-concurrent migration from running.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy-pasta. Need the correct description of this function instead.

@@ -1561,36 +1634,15 @@ func (e *Executor) CancelPendingMigrations(ctx context.Context, message string)
}

// scheduleNextMigration attemps to schedule a single migration to run next.
// possibly there's no migrations to run. Possibly there's a migration running right now,
// in which cases nothing happens.
// possibly there's no migrations to run.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: there are

// possibly there's no migrations to run.
// The effect of this function is to move a migration from 'queued' state to 'ready' state, is all.
// Notice that the query sqlScheduleSingleMigration embeds some logic inside. We may choose
// to refactor the logic into the app
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm.. which app is this referring to?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarified, moved explanation into the function itself. This is more a developer's note for future me/us.

// Possible scenarios:
// - no migration is in 'ready' state -- nothing to be done
// - a migration is 'ready', but conflicts with other running migrations -- try another 'ready' migration
// It is therefore possible that there is a 'ready' migration, and still this function only handles one.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence is confusing. I would have thought that it is possible that there is a "ready" migration and still this function chooses none.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow the wording was really bad. Changed to:

// Note that per the above breakdown, and due to potential conflicts, it is possible to have one or
// more 'ready' migration, and still none is executed.

// a vreplication migration from a pre-PRS/ERS that we still need to learn about?
// We're going to be careful here, and avoid running new migrations until we have
// a better picture. It will likely take a couple seconds till next iteration.
// execution. This delay ony takes place shortly after Open().
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// execution. This delay ony takes place shortly after Open().
// This delay only takes place shortly after Open().

RequestContext: row["migration_context"].ToString(),
// getNonConflictingMigration finds a single 'ready' migration which does not conflict with running migrations.
// Conflicts are:
// - a migration is 'ready' but is not set to run _concurrnetly_, and there's a running migration that is also non-concurrent
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// - a migration is 'ready' but is not set to run _concurrnetly_, and there's a running migration that is also non-concurrent
// - a migration is 'ready' but is not set to run _concurrently_, and there's a running migration that is also non-concurrent

if i == 0 {
break
// no non-conflicting migration found...
// Either all ready migrations are conflicting, or there's no ready migrations...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's no ready migration OR there are no ready migrations.

@shlomi-noach
Copy link
Contributor Author

Addressed all comments.

@shlomi-noach shlomi-noach merged commit 395ffc4 into vitessio:main Dec 12, 2021
@shlomi-noach shlomi-noach deleted the online-ddl-concurrent-revert-migrations branch December 12, 2021 06:09
@shlomi-noach
Copy link
Contributor Author

Woohoo! That's an important one merged.

@shlomi-noach shlomi-noach mentioned this pull request Jun 7, 2022
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Query Serving release notes (needs details) This PR needs to be listed in the release notes in a dedicated section (deprecation notice, etc...) Type: Enhancement Logical improvement (somewhere between a bug and feature)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants