-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🔀 MERGE: Remove Django storage backend #5330
Conversation
b1e3d55
to
f3880e4
Compare
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @chrisjsewell . A lot of work. Reviewing therefore is also not the easiest, but I gave it a real shot. Put various comments in the files that I went through and will list some other general comments here below.
Naming
I wonder if there is maybe a slightly better name for the PsqlDostore
and psql_dostore
abbreviations for the backend type. When I first read it, I really tripped over the store
part, only realizing afterwards that it refers to disk object store
. I wonder whether either going full abbreviation PsqlDos/psql_dos
may be better, or alternatively PsqlDiskObjectStore/psql_disk_object_store
. For the class name I prefer the latter, but for the string identifier I think psql_dos
may be fine, but I wouldn't mix, so I would go for PsqlDos/psql_dos
as the best option.
Generic
- For my understanding, you specify the
branch_label
in theaiida/backends/sqlalchemy/migrations/versions/e15ef2630a1b_initial_schema.py
migration, but not in any other. I take it is only necessary to indicate this in the very first migration of a branch? Consecutive migrations will automatically inherit through the revision numbers? - What is the purpose of
aiida/backends/sqlalchemy/migrations/versions/main_0001.py
? It just contains todo's for now.
Behavior
- When running an unmigrated Django profile, this is the error displayed when loading the backend (through
verdi
):
Critical: The database has no known version.
This should probably be the typical message saying that the current version is not compatible with that of the code and the database needs be migrated, saying to use verdi storage migrate
.
-
It seems that you no longer use the schema generation. Is there still a plan to decommission the very early migrations at some point? Is there a clean and efficient way to do this with alembic perhaps that doesn't require this custom concept? That would be great, but in any case I do think it is important we think about how we will do this in the future. I think it still makes sense to do it if we can do it without interruptions for users. (What is acceptable is that they cannot skip major versions when upgrading their
aiida-core
versions). -
verdi status
no longer shows the backend schema version. Fine to remove it from here, but I think it is useful to have this somewhere for easy debugging. Maybe inverdi devel
then if we think this is not necessary for users typically? Also should update deprecation warning inverdi database version
then which points toverdi status
as the replacement.
Codebase
Still some references of Django in the source code that may need to be removed/changed:
- In
aiida/manage/configuration/schema/config-v7.schema.json
the default should definitely be changed topsql_dostore
. - In
aiida/manage/configuration/schema/config-v5.schema.json
andaiida/manage/configuration/schema/config-v6.schema.json
there is stilldjango
as the default for the backend. Should we retroactively change this tosqlalchemy
? Not sure it will ever come into play but couldn't hurt, could it? docs/source/intro/troubleshooting.rst
still references django in output ofverdi status
, best to replace it with something else. It is just the profile name and repository path, but would be better to have some standard profile name not connected to legacy backend.docs/source/internals/database.rst
still describes Django database tables that will be removedauth_group
etc.docs/source/howto/installation.rst
still has old output ofverdi profile show
including django as backendocs/source/conf.py
remove workaround for Django by definingautodoc_default_options
docs/source/nitpick-exceptions
remove exceptions related to Django classes
tests/backends/aiida_sqlalchemy/migrations/django_branch/test_migrate_to_head.py
Outdated
Show resolved
Hide resolved
tests/backends/aiida_sqlalchemy/migrations/test_schema_vs_models.py
Outdated
Show resolved
Hide resolved
tests/restapi/conftest.py
Outdated
backend_manager.reset_backend_environment() | ||
actual_profile = aiida_profile._manager._profile # pylint: disable=protected-access | ||
backend_manager.load_backend_environment(actual_profile, pool_timeout=1, max_overflow=0) | ||
# TODO re-instate this fixture? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fixture was explicitly created for a unit test of the REST API. Unless the test can be removed, this should probably not be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the tests pass without this fixture though. What I want to understand, to think if/how it should be re-instated, is exactly what the pool_timeout=1, max_overflow=0
are intended to test, as this was not documented at all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the tests pass without this fixture though
Well the point is that it is creating a situation (an engine with reduced pool timeout) that caused problems in real-life setups. The test made sure that even under those constrained conditions, things still worked. So of course if you remove the constraint it might still work, but you are not testing what you were intending to test. The settings were such to ensure that if the original problem was still there, it would be triggered as fast as possible. With more lenient settings, the test might pass simply because there is a lot of buffer, even with the underlying problem still being there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your understanding of the fixture and test was incorrect 😬, which was my point of it being poorly documented.
The actual goal has nothing really to do with pool timeouts, it is just checking that threads properly close their database connections (see b9d4bbe).
Anyhow, I've now re-worked it in 1b98a91
(#5330).
Rather than trying to hack into the backend's connection creation, the profile's storage_config
section now allows for an engine_kwargs
key, which is used by create_sqlalchemy_engine
to allow any additional kwargs to be passed to sqlalchemy.create_engine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note, its still not ideal, because the restapi is essentially hard-coded to the psql_dos
backend, relying on sqlalchemy's scoped_session
to handle thread safety.
Going forward, I'm going to look into making get_manager
thread local, which should mean you can properly use AiiDA in a multi-threaded mode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your understanding of the fixture and test was incorrect grimacing,
Are you really sure. I remember the PR that added that fixture. The problem was as you say that the REST API threads weren't properly closing the database connections. This would cause them to accumulate and at some point the calls would fail since there were no more connections allowed. To test the fix, @CasperWA wanted to make sure that there was a test that would try to hit the limit. If this were to be done with the standard settings, it would take way too long, so to speed up the process, the engine was configured such that it would hit the error almost instantly in case of a regression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep thats what it does now 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just making sure to not break your good work 😉
Cheers! Where possible, I tried to keep the diffs to a minimum 😅
yep happy to discuss, your suggestions seem reasonable
Yes indeed, you can see this via:
As discussed in sqlalchemy/alembic#983, there are now three "branches":
Ooops, this was just a lil bug, fixed in 9d8ab164bccfb88a22f567f67a497f40fb3d1054, i.e. if there is no alembic version table, but there is a django version table, then it shows a specific error message about this
Yes there is, you just simply remove them 😄
Yep, very simply, you just call
Yep cheers. To note, I haven't touched the documentation at all, apart from to remove django references. I'll do some improvements of the developer documentation in a follow-up PR |
|
Added |
Interestingly, before 4fe874c, |
|
998eead
to
c6ed03c
Compare
Before f26ce67, there were inconsistencies and issues with the resetting of storage for
This is fixed, by adding the In line with this change, |
In 07b27064b213e7301bfb285631432fd9ae5d1cf2, I have refactored the "problematic" auto-group functionality. |
In fedd4e3, I have removed the @sphuber during this, I also "fixed" your rabbitmq version check, which was forcing that a connection to be made with rabbitmq on every |
Some lower-priority TODOs I think can be left to later PRs:
|
done |
|
This is just about good to go |
Note that it still maybe very useful to create this as a commit now, after the fact. It would be very easy to rebase this and create a single commit just with the regression tests. |
This was done intentionally though. At least @giovannipizzi had requested that it would run the check as soon as the profile would be loaded. This would prevent a user potentially creating a bunch of nodes and doing some work, only to submit a workflow and find out the version is not supported. I think I am personally ok with that, so I wouldn't mind keeping your fix, but maybe @giovannipizzi should also give his ok. |
I indeed initially made it nullable, but then realized it was better to always have a value, so in 4a347b6 I added the default. It is true that we probably also should have made it non-nullable at that point. Since the added default was very early, and we indeed don't support unreleased schemas, I would be fine with simply doing this change in place without any explicit migrations. |
Yeh I'm strongly against that. If we really want to prevent such things, I would even suggest actually raising an exception when trying to load the communicator, if RabbitMQ is the wrong version, with a configuration option to deactivate this, for users that still want to proceed. |
As long as the message appears when the user first runs a workflow (rather than whet they first create nodes), I'm still OK as the problems will occur when running.
Just to be clear, you are against early checks? What do you think @sphuber and @chrisjsewell ? |
exactly 👍 the check is definitely a good thing, and appreciated, but not at the expense of all other aiida functionality |
I am not against early checks per-se, only if that has other disadvantages. I have to agree with @chrisjsewell that for some parts of AiiDA, connection with RabbitMQ is not necessary and making it fail even when not necessary, seems unnecessarily restrictive. I think the current solution is better. It will only perform the check and warn when a communicator is created, which is whenever you start to run any processes. For example, running In [1]: from aiida.engine import calcfunction
In [2]: @calcfunction
...: def add(a, b):
...: return a + b
...:
In [3]: add(Int(1), Int(2))
Warning: RabbitMQ v3.8.2 is not supported and will cause unexpected problems!
Warning: It can cause long-running workflows to crash and jobs to be submitted multiple times.
Warning: See https://github.com/aiidateam/aiida-core/wiki/RabbitMQ-version-to-use for details.
Out[3]: <Int: uuid: ebdec2dc-53dd-4ae2-af0b-727d91de5cbb (pk: 342) value: 3>
I don't think I would go for turning it into an exception for now. Would leave it as a warning |
202f942
to
9f488ad
Compare
@sphuber, geez I'm in the middle of re-writing the PR description which, as you requested, was gonna be added to the commit description. (not the end of the world, but please never merge my PRs) |
Thanks for the review though! |
Apologies, since you pushed the rebased branch I figured you didn't want to include the commit message. Since there has been a lot of work already, I wasn't going to push it and so thought to get on with it. I didn't see why you would push already if you were still going to work on the commit message. A quick note that you were still going to do that would have done the trick. |
God works in mysterious ways 👼 Added my beautiful UML diagrams to the description, which anyhow would not have gone in the commit message |
@sphuber not sure if I've broken codecov integration; they are definitely triggering https://app.codecov.io/gh/aiidateam/aiida-core/compare/5330, but not seeing the results "pushed back" |
Looks like it. It is indeed no longer showing up on the pull requests. But could also be unrelated and just a problem between Github and Codecov. |
@sphuber its ok now, I reset the webhook |
This is a follow-up to #5330, which separates the storage backend abstraction and implementations, with the abstraction in `aiida.orm` and implementations moving to `aiida/storage`.
This PR fully removes the Django storage backend and Django dependency, whilst preserving the ability to migrate all existing django profiles to the unified
psql_dostore
backend schema.This is made possible, by re-writing all django database migrations as sqlachemy based alembic ones.
In doing this, it has allowed the backend storage abstraction to move away from the limitations of the Django ORM API, principally:
This thus enforced that only a single profile's storage could be accessed, for the lifetime of a Python process, i.e. no profile switching was allowed.
For sqlalchemy, no such limitation exists, since any number of
sqlalchemy.Engine
instances can be created, with different connection configurations.The limitation in django also promoted some "bad practices" in aiida, namely the use of multiple global variables and their indirect reference by class methods. Global variables are problematic for numerous reasons, particularly when overused, see e.g. https://stackoverflow.com/questions/10525582/why-are-global-variables-considered-bad-practice
New UML
The following UML diagrams highlight the changes in the storage abstraction, importantly, global variables are alway accessed via the
Manager
and storage is always accessed via the storage backend:Previous
New
The following changes have then been made to the abstraction:
The
BackendManager
class has been fully removed: all interaction with a profile's storage now routes through theBackend
class.The
Backend
class includes the class methods:version_head
,version_profile
andmigrate
, which take aProfile
as input (for connection configuration). These allow for profiles with storage at an old schema revision to be assessed and migrated.the storage of a profile that is different to the globally loaded profile.
Profile
has aProfile.get_storage_cls
, for returning the correct subclass from which to call these methodsA
Backend
class is instantiated with aProfile
, at which point it should validate that the profile's storage is accessible and at the latest (a.k.a head) schema versionrepository|uuid
key value.The
CURRENT_AUTOGROUP
global variable has been removed and, instead, anAutogroupManager
is instantiated on the storage backend, available viaBackend.autogroup
, which can be enabled/disabled(default).This ensures that a node will never try to add itself to an
AutoGroup
associated with a different backend.Specific to the
PsqlDosBackend
(formerlySqlaBackend
)All schema validation / migration logic goes through the
PsqlDostoreMigrator
class. This has somewhat similar to functionality to the previousBackendManager
, except it should never be accessed directly (except for migration tests), but always via thePsqlDostoreBackend
.PsqlDostoreMigrator
is also "exposed" for developers viapython -m aiida.backends.sqlalchemy.alembic_cli
, which you can use to view the migration history, and to create new migrations (after changes to the ORM)To support legacy profiles (pre v2.0), we maintain three alembic branches:
django
branch and ansqlalchemy
branch, whose "heads" have identical schemasmain
branch, which all new migrations will build on.When calling
PsqlDosBackend.migrate
:verdi setup
, we simply initialise it with the current ORM schema, rather than going up through every migration (as is the case now.reset the revision as one on the main branch, and then migrate to the head of the main branch
reset the revision as one on the main branch, and then migrate to the head of the main branch
Because migrations are no longer tied to the global profile, this has greatly improved the test infrastructure for migrations.
PGTest
PostgreSQL database cluster is created when required, and scoped to the whole test session, then a separate, empty database is created/destroyed, per migration test, and a migration test simply migrates up to the target migration.Contexted profile access
One key goal of this "re-abstraction", is to allow for "contexed" profile access, wich is now available using the
profile_context
function:One can even use an archive as a profile:
Changes to test fixtures
There were inconsistencies and issues with the resetting of storage for
AiidaTestCase
andTestManager
(used by theclear_database
fixtures):AiidaTestCase
had a bunch of custom code to delete and recreate the repository, but did not set the new repository UUID in the databaseTestManager
did not clear the repository at allThis is fixed, by adding the
Backend._clear
abstract method and thePsqlDostoreBackend._clear
implementation, which allows the storage to be properly reset in the tests, in a backend agnostic manner.It has also allowed for the removal of
AiidaTestImplementation
/SqlAlchemyTests
, since the test fixtures no longer require any backend specific code.In line with this change,
TestManager.reset_db
has been deprecated and renamed toTestManager.clear_profile
and also the pytest fixtures:clear_database
,clear_database_after_test
,clear_database_before_test
andclear_database_before_test_class
have been deprecated and replaced withaiida_profile_clean
,aiida_profile_clean_class
Notes:
The large
+
diff on this PR, is primarily due totests/backends/aiida_sqlalchemy/migrations/test_all_schema
containing a YAML file, specifying the full database schema for all past revisions (django & sqlalchemy); tables, columns, constraints (primary key, unique, foreign key), and indexes.