Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repository.initialise never called #4937

Closed
chrisjsewell opened this issue May 7, 2021 · 3 comments
Closed

Repository.initialise never called #4937

chrisjsewell opened this issue May 7, 2021 · 3 comments

Comments

@chrisjsewell
Copy link
Member

chrisjsewell commented May 7, 2021

The AbstractRepositoryBackend has these:

    @abc.abstractmethod
    def initialise(self, **kwargs) -> None:
        """Initialise the repository if it hasn't already been initialised.

        :param kwargs: parameters for the initialisation.
        """

    @abc.abstractproperty
    def is_initialised(self) -> bool:
        """Return whether the repository has been initialised."""

Which then are for example inmplemented in the DiskObjectStoreRepositoryBackend:

    def initialise(self, **kwargs) -> None:
        """Initialise the repository if it hasn't already been initialised.

        :param kwargs: parameters for the initialisation.
        """
        self.container.init_container(**kwargs)

    @property
    def is_initialised(self) -> bool:
        """Return whether the repository has been initialised."""
        return self.container.is_initialised

However, initialise in never actually called anywhere in the code base (except in AiidaTestCase.initialise_repository)

How these backends actually get initialised is

  1. For SandboxRepositoryBackend it is created when SandboxRepositoryBackend.sandbox is called
  2. For DiskObjectStoreRepositoryBackend it is created directly (via container.init_container) in either aiida/backends/djsite/db/migrations/0047_migrate_repository.py or aiida/backends/sqlalchemy/migrations/versions/1feaea71bd5a_migrate_repository.py (obviously depending on your orm backend)

I feel we should be calling initialise somewhere, and not relying on the migration?

Also DiskObjectStoreRepositoryBackend.initialise should probably include the self.container.is_initialised check, before container.init_container, to ensure idempotence

@sphuber
Copy link
Contributor

sphuber commented May 8, 2021

commit 87ab7e3
Author: Sebastiaan Huber [email protected]
Date: Tue May 4 08:26:55 2021 +0200

Profile: only initialise the repository during the migration (#4900)

The new disk object store migration that will ship with `v2.0` requires
to be initialised once and only once, and it will generate the necessary
folders and configuration file. This process was being done in the
method `Profile.get_repository` guarded by a check in case it was
already initialised.

The problem with this was that the container could be initialised too
early during the early stages of a profile setup. As soon as the
repository was fetched, it would be initialised generating, among other
things, the UUID. This would then trigger the check that the database
contained the same UUID, which would of course fail, since the database
was empty. This could have been fixed by ignoring the check at this
point, but the real problem is that the repository should not be
initialised at this point. The only point at which the repo should be
initialised is during the corresponding database migration that
introduced the disk object store repository. Both for existing as well
as for new profiles, they will go through this migration and so it and
it alone should be responsible for initialising the repository.

After removing the automatic initialization in `get_repository_container`
of the `Profile` class, it revealed an actual bug in the migration. The
migration initialised the new disk object store container only after
the check whether the database is empty, in which case it would short cut
and skip the repository migration entirely, since there are no nodes to
migrate anyway. However, this would leave the repository uninitialised
and cause problems down the road. This wasn't seen before because the
repository would be initialised anyway as soon as it was fetched the
first time. With the change in behavior of `get_repository_container`
this is no longer the case and so the migration had to be fixed to always
initialise the repository, even for empty databases, which is often the
case for newly created profiles.

This approach did create problems for the unittests though, as they
would sometimes clean the repository. It would this not by just removing
the contents, but it would delete the entire container. This meant it
had to be recreated, but since in normal operations this only happens
during the migration (which also during tests only happens once, unless
maybe during the migration tests themselves) and so an error would be
raised that the repository is not initialised. The solution is to
reinitialise a new repo as soon as the old one was destroyed. Currently
this is done by simply deleting the folder on disk and reinitialising an
entire new instance. In the future, it would be better if the existing
container could be kept and its contents could simply be dropped, but
this would require a feature in the `disk-objectstore` library.

@chrisjsewell
Copy link
Member Author

Thanks for the link to #4900.
However, I still feel this issue should remain open, because I think the current state is not ideal and this could easily lead to issues down the line; say for example someone adds code to the repositry/backend initialise expecting it to be run.
At least I think we should add a note of this in these functions.

Additionally, if we are re-initialising the repository in AiidaTestCase, should it also not be added to the pytest fixtures? or do these work differently?

@chrisjsewell chrisjsewell changed the title RepositoryBackend.initialise never called Repository.initialise never called May 9, 2021
@chrisjsewell
Copy link
Member Author

In #5330 we now directly initialise new profile storage (in PsqlDostoreMigrator.initialise), as opposed to through the migrations

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants