Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TriblerDatabaseMigrationChain #7622

Merged
merged 1 commit into from
Oct 17, 2023

Conversation

drew2a
Copy link
Contributor

@drew2a drew2a commented Oct 9, 2023

This PR introduces a more strict and formal way of DB migrations.

The main idea behind this change is to standardize the migration procedure into a series of functions that resemble:

first migration

@migration(execute_only_if_version=0)
def scheme_migration_0(db: TriblerDatabase, **kwargs):  # pylint: disable=unused-argument
    db.execute('ALTER TABLE "TorrentState" ADD "has_data" BOOLEAN DEFAULT 0')
    db.execute('UPDATE "TorrentState" SET "has_data" = 1 WHERE last_check > 0')

second migration

@migration(execute_only_if_version=1)
def scheme_migration_1(db: TriblerDatabase, **kwargs):  # pylint: disable=unused-argument
    db.execute('ALTER TABLE "Tracker" ADD "updated" INT DEFAULT 0')

This PR should fix some of the problems described in #7382
Related to #7398

@drew2a drew2a force-pushed the feature/triblerdb_migration branch 2 times, most recently from b098927 to ae3e20c Compare October 16, 2023 07:16
@drew2a drew2a marked this pull request as ready for review October 16, 2023 07:16
@drew2a drew2a requested review from a team and kozlovsky and removed request for a team October 16, 2023 07:16
@drew2a drew2a force-pushed the feature/triblerdb_migration branch from ae3e20c to ef744cb Compare October 16, 2023 07:39
@drew2a drew2a mentioned this pull request Oct 16, 2023
12 tasks
src/tribler/core/upgrade/tribler_db/migration_chain.py Outdated Show resolved Hide resolved
src/tribler/core/upgrade/upgrade.py Outdated Show resolved Hide resolved
f"Function {func.__name__} is not executed because DB version is not equal to {target_version}. "
f"The current db version is {db.version}"
)
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks dangerous to silently skip the migration in case of a version mismatch and can lead to incorrect migrations.

As a possible example:

  • We have migration chain [a, b, c], and after the migration c, the database has version 3
  • Two developers make two independent PRs with migrations d and e, both require db version 3
  • After both PRs were successfully merged, the merge conflict in the migration chain was resolved to [a, b, c, d, e], but both d and e still mistakenly require version 3 of the database.
  • The execution of the migration chain silently skips the migration e

For that reason, it is better to raise an error instead of skipping the migration so the database will not be left in a surprising state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we were to raise an exception as you suggested, then almost all runs would result in unsuccessful execution.

Example:

  1. We have a migration chain [a, b, c] with the corresponding versions [1, 2, 3]. Let's represent them as [a(1), b(2), c(3)].
  2. Our DB currently has a version set to 2.
  3. We begin executing the migration chain starting with migration a(1).
  4. This results in an exception because the DB version is not equal to 1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can first select suitable migrations, as described in the previous comment, and execute only them. This way, the migration decorator can throw an exception if the migration was performed at an improper moment. It looks safer than silently skipping migrations with incorrect db version numbers.

Also, consider the migrations [a, b, c, d, e] that mistakenly expect the schema version [1, 3, 2, 4, 5]. With silent skipping of the incorrect migrations, only migrations 1 and 2 will be executed. It is better to prevent such errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I'm not sure that it is a good idea to automatically fix potential merge conflict errors.

Also, I'm not sure that the Migration Chain should have a monopoly on changing the version value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I'm not sure that it is a good idea to automatically fix potential merge conflict errors.

I agree, and what I like in the current approach is that the merge conflicts for migrations always require explicit conflict resolution, as migrations are explicitly listed in the same list.

What I'm concerned about is that the current PR approach allows silent skipping of migration if the schema number specified in the decorator does not agree with the order of migrations in the explicit list. Raising an error looks like a safer approach.


return True

def steps(self) -> Iterator:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the terminology: I'd say we have a migration chain that consists of migrations, and a single migration can possibly consist of several steps. Here, in the for loop, we iterate over migrations and not over the steps of an individual migration. So, it is better to rename it from steps to migrations to avoid confusion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have a property named migrations that stores all the migrations for this chain.

def steps(self) represents the execution steps for the migration chain.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, it is pretty common to say that an individual migration may consist of several steps. For that reason, using the word "steps" when referring to migrations looks confusing. Maybe we can rename steps to something like find_suitable_migrations(current_schema_version).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you prove it?

it is pretty common to say that an individual migration may consist of several steps.

logger = logging.getLogger('Migration (TriblerDB)')


def migration(execute_only_if_version: int):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: It looks correct, but it may be more convenient to specify the resulting version after the migration and assume the schema version before the migration should be one less. In that case, @migration(schema_version=10) would mean that the version after the migration is executed successfully should be 10, and the version before the migration should be 9. In that case, it will be a bit more obvious that after applying @migration(schema_version=10), the resulting schema has version 10. But it is a matter of personal preferences.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this is "a matter of personal preferences". Let's ask @xoriole for a second opinion.

@drew2a drew2a requested a review from kozlovsky October 16, 2023 10:01
@synctext
Copy link
Member

synctext commented Oct 16, 2023

a more strict and formal way of DB migrations.

A single place where the new format is documented would be great. Now lots of stuff is scattered across multiple issues. Fine for now, but please don't invest too much time into a generic migration tool with sub-migrations and concurrent developer conflict resolution fanciness. Please give me a new stable Tribler release in 2 weeks, with the merged Randevous certificates!

Two developers make two independent PRs with migrations d and e, both require db version 3

We only have 1 responsible developer for tribler.db migration, so this is not a scenario we need right?

@drew2a
Copy link
Contributor Author

drew2a commented Oct 16, 2023

@synctext, the PR has been pending for about a week. I haven't received any feedback from the development team. But before I can merge, I need approval.

In my view, the PR was already in good shape.

We only have 1 responsible developer for tribler.db migration, so this is not a scenario we need right?

It's not the actual scenario. Any team member can write migrations, as has been done in the past. Unfortunately, there's no dedicated role as 'the developer responsible for tribler.db migration'. But regarding the

Two developers make two independent PRs with migrations d and e, both require db version 3

I agree that it's a low-probability scenario that we shouldn't focus on.

@kozlovsky
Copy link
Contributor

the PR has been pending for about a week. I haven't received any feedback from the development team. But before I can merge, I need approval.

But it was marked as ready for review just today. So, it looks reasonable that after the PR is ready for review, it can receive some review comments and not be accepted immediately.

Copy link
Contributor

@kozlovsky kozlovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After additional discussion with @drew2a, we decided that we can handle version conflicts between different migrations in the future, and it is not necessary to handle it right now, as the chance of this error is low.

@drew2a drew2a force-pushed the feature/triblerdb_migration branch from 33f163e to 3bda1f7 Compare October 16, 2023 14:37
@drew2a drew2a merged commit 3c53722 into Tribler:main Oct 17, 2023
15 checks passed
@drew2a drew2a deleted the feature/triblerdb_migration branch October 17, 2023 07:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants