You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a historically established approach. We, as developers, follow this approach with the contemporary Tribler developments and just add more and more migrations to the existing ones.
However, this approach is subject to at least two problems. The first problem is easier to understand and avoid therefore I first start with it.
The problem: a developer should be quite careful with the conditions inside his migration.
There are no guidelines or even tips or hints on how the migration should be written. From the perspective of the code, it is just a function without any prerequisites.
There is an example of the two very first migrations
classTriblerUpgrader:
...
defrun(self):
self.upgrade_pony_db_8to10()
self.upgrade_pony_db_10to11()
defupgrade_pony_db_8to10(self):
database_path=self.state_dir/STATEDIR_DB_DIR/'metadata.db'ifnotdatabase_path.exists() orget_db_version(database_path) >=10:
return
... # code of the migrationdefupgrade_pony_db_10to11(self):
database_path=self.state_dir/STATEDIR_DB_DIR/'metadata.db'ifnotdatabase_path.exists():
return
... # code of the migration
From the code above, the conditions for a migration execution are
As these migrations execute each Tribler's run it is possible that false positives may occur if the conditions are not carefully chosen.
For example, the upgrade_pony_db_10to11 migration will be executed at any Tribler's run if metadata.db is missed.
This could lead to unexpected consequences because usually when writing migrations the programmer thinks that his migration will run only once and only during upgrading the application.
With the current upgrade approach, this problem could be solved only by being extremely careful while choosing conditions for each migration. And the easiness of making a mistake is great.
Since developers are humans, they will make this mistake sooner or later.
The problem: a migration uses models that could be inconsistent with the previous versions.
This problem is harder to understand and avoid.
Some migrations use models like:
MetadataStore
TriblerConfig
BandwidthDatabase
...
TagDatabase
These models always change throw time. New fields are added. New __init__ logic is introduced, default values are changed, etc.
When writing the migration, the developer has in mind the model that is present in the code at that point in time. And at that point in time the migration could work perfectly. But after some time, let's say after one year, another developer could change the model. This change will also slightly entail a change in migration which is not obvious. This change may not be detected by tests, but later on, it can lead to strange errors.
Another example (MetadataStore).
Imagine, we have three versions of MetadataStore that were changed during the developing process:
MetadataStore 2019 (v1)
MetadataStore 2020 (v2)
MetadataStore 2021 (v3)
For these changes, two migrations were written:
MetadataStoreMigration 2020 (v1->v2)
MetadataStoreMigration 2021 (v2->v3)
As migrations keep in place in the Tribler code, they all refer to MetadataStore as a model to which the migration should be applied. With the naive approach, the first migration will likely be broken in the year 2021, because the old model's version (v1) no longer exists. That's why Tribler developers use pure SQL for DB migrations.
But this approach (semi-pure SQL migrations) will not save us from the inconsistency of other models, as we keep using Python code as a migration scenario.
Another example (BandwidthDatabase).
Let's say we will remove BandwidthDatabase in the next release. But what should we do with all migrations that use BandwidthDatabase model? The most straightforward approach would be to remove them. But it will lead to two consequences:
We will reduce the number of Tribler versions from which you can upgrade to the current.
We can break the chain of upgrades if the BandwidthDatabase upgrade is located somewhere in the middle (1->2->BDUpgrade->3->4).
The answer to this question will have to be found by Tribler developers in the future.
Conclusion
This issue describes the two problems that are most obvious to me and that can lead to bugs.
I haven't touched here on how convenient it is to write migrations (not convenient).
The risks of the first problem can be managed by increasing attention to migration writing, but the risks of the second problem cannot be managed at all with the current approach.
So, with this second problem, it is just a matter of time before a single action by a programmer, not directly related to migrations, will break them.
We should discuss these problems and find a solution as each new release increases the probability that the errors described above will be made.
The migrations (and the problems around them) are very well-known in the developer's world. We can simply take the best practices that work successfully in hundreds of other projects and not reinvent the wheel.
The text was updated successfully, but these errors were encountered:
The current upgrader procedure is the following.
No matter what it runs all available migrations on each Tribler's run:
This is a historically established approach. We, as developers, follow this approach with the contemporary Tribler developments and just add more and more migrations to the existing ones.
However, this approach is subject to at least two problems. The first problem is easier to understand and avoid therefore I first start with it.
The problem: a developer should be quite careful with the conditions inside his migration.
There are no guidelines or even tips or hints on how the migration should be written. From the perspective of the code, it is just a function without any prerequisites.
There is an example of the two very first migrations
From the code above, the conditions for a migration execution are
upgrade_pony_db_8to10:
upgrade_pony_db_10to11
As these migrations execute each Tribler's run it is possible that false positives may occur if the conditions are not carefully chosen.
For example, the
upgrade_pony_db_10to11
migration will be executed at any Tribler's run ifmetadata.db
is missed.This could lead to unexpected consequences because usually when writing migrations the programmer thinks that his migration will run only once and only during upgrading the application.
With the current upgrade approach, this problem could be solved only by being extremely careful while choosing conditions for each migration. And the easiness of making a mistake is great.
Since developers are humans, they will make this mistake sooner or later.
The problem: a migration uses models that could be inconsistent with the previous versions.
This problem is harder to understand and avoid.
Some migrations use models like:
MetadataStore
TriblerConfig
BandwidthDatabase
...
TagDatabase
These models always change throw time. New fields are added. New
__init__
logic is introduced, default values are changed, etc.When writing the migration, the developer has in mind the model that is present in the code at that point in time. And at that point in time the migration could work perfectly. But after some time, let's say after one year, another developer could change the model. This change will also slightly entail a change in migration which is not obvious. This change may not be detected by tests, but later on, it can lead to strange errors.
Another example (MetadataStore).
Imagine, we have three versions of
MetadataStore
that were changed during the developing process:MetadataStore
2019 (v1)MetadataStore
2020 (v2)MetadataStore
2021 (v3)For these changes, two migrations were written:
MetadataStoreMigration
2020 (v1->v2)MetadataStoreMigration
2021 (v2->v3)As migrations keep in place in the Tribler code, they all refer to
MetadataStore
as a model to which the migration should be applied. With the naive approach, the first migration will likely be broken in the year 2021, because the old model's version (v1) no longer exists. That's why Tribler developers use pure SQL for DB migrations.But this approach (semi-pure SQL migrations) will not save us from the inconsistency of other models, as we keep using Python code as a migration scenario.
Another example (BandwidthDatabase).
Let's say we will remove
BandwidthDatabase
in the next release. But what should we do with all migrations that useBandwidthDatabase
model? The most straightforward approach would be to remove them. But it will lead to two consequences:The answer to this question will have to be found by Tribler developers in the future.
Conclusion
This issue describes the two problems that are most obvious to me and that can lead to bugs.
I haven't touched here on how convenient it is to write migrations (not convenient).
The risks of the first problem can be managed by increasing attention to migration writing, but the risks of the second problem cannot be managed at all with the current approach.
So, with this second problem, it is just a matter of time before a single action by a programmer, not directly related to migrations, will break them.
We should discuss these problems and find a solution as each new release increases the probability that the errors described above will be made.
The migrations (and the problems around them) are very well-known in the developer's world. We can simply take the best practices that work successfully in hundreds of other projects and not reinvent the wheel.
The text was updated successfully, but these errors were encountered: