Skip to content

Commit

Permalink
Add ProfileAccessManager to provide exclusive-access profile locks (#…
Browse files Browse the repository at this point in the history
…5270)

With the new disk object store repository backend implementation, there
is the need to perform maintenance operations on the repository from
time to time. Some of these operations do not allow other processes to
read from or write to the repository concurrently. Therefore it is
necessary for the maintenance functionality to obtain an exlusive-access
lock on the profile's backend.

To this end the class `aiida.manage.profile_access.ProfileAccessManager`
is added. From now on, when the storage backend of a profile is loaded
it should call the `request_access` method of the manager. This will
cause a PID file to be written to the `ACCESS_CONTROL_DIR` directory of
the given profile. Since this access is non-exclusive, multiple processes
can request access like this concurrently and a PID file will be written
for each, where the command of the process is written to the file. This
information is used in order to be able to determine which PID files
may have gone stale.

When the maintenance operation requires exclusive-access, it should
use the `lock` context manager. If the profile is currently being
accessed or already locked by another process, the lock request will
fail. Otherwise, a lock file is created and this will guarantee that
other processes won't be given a lock or be given normal access.

The choice was made to use a custom implementation for the PID and lock
files instead of using existing libraries such as `filelock`. The reason
for this decision is because the use case is quite specific, where not
only do we have exclusive as well as non-exclusive access, we also
decided that we want to keep a record of the IDs of the processes that
get access, such that when the request for a lock or access is denied,
the error message can provide those PIDs. This will make it easier for
the user to debug which (potentially dead) process is blocking the
profile. The downside of this approach is of course that a custom
implementation is more prone to bugs and cross-platform incompatibility
compared to using a well tested library.

Co-authored-by: Sebastiaan Huber <[email protected]>
  • Loading branch information
ramirezfranciscof and sphuber authored Jan 21, 2022
1 parent 8e52d18 commit 4cf9d93
Show file tree
Hide file tree
Showing 9 changed files with 637 additions and 3 deletions.
9 changes: 9 additions & 0 deletions aiida/backends/managers/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# -*- coding: utf-8 -*-
###########################################################################
# Copyright (c), The AiiDA team. All rights reserved. #
# This file is part of the AiiDA code. #
# #
# The code is hosted on GitHub at https://github.com/aiidateam/aiida-core #
# For further information on the license, see the LICENSE.txt file #
# For further information please visit http://www.aiida.net #
###########################################################################
2 changes: 2 additions & 0 deletions aiida/common/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,8 @@
'LicensingException',
'LinkType',
'LoadingEntryPointError',
'LockedProfileError',
'LockingProfileError',
'MissingConfigurationError',
'MissingEntryPointError',
'ModificationNotAllowed',
Expand Down
15 changes: 14 additions & 1 deletion aiida/common/exceptions.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,8 @@
'PluginInternalError', 'ValidationError', 'ConfigurationError', 'ProfileConfigurationError',
'MissingConfigurationError', 'ConfigurationVersionError', 'IncompatibleDatabaseSchema', 'DbContentError',
'InputValidationError', 'FeatureNotAvailable', 'FeatureDisabled', 'LicensingException', 'TestsNotAllowedError',
'UnsupportedSpeciesError', 'TransportTaskException', 'OutputParsingError', 'HashingError', 'DatabaseMigrationError'
'UnsupportedSpeciesError', 'TransportTaskException', 'OutputParsingError', 'HashingError', 'DatabaseMigrationError',
'LockedProfileError', 'LockingProfileError'
)


Expand Down Expand Up @@ -260,3 +261,15 @@ class HashingError(AiidaException):
"""
Raised when an attempt to hash an object fails via a known failure mode
"""


class LockedProfileError(AiidaException):
"""
Raised if attempting to access a locked profile
"""


class LockingProfileError(AiidaException):
"""
Raised if the profile can`t be locked
"""
14 changes: 13 additions & 1 deletion aiida/manage/configuration/settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,12 @@
DEFAULT_CONFIG_INDENT_SIZE = 4
DEFAULT_DAEMON_DIR_NAME = 'daemon'
DEFAULT_DAEMON_LOG_DIR_NAME = 'log'
DEFAULT_ACCESS_CONTROL_DIR_NAME = 'access'

AIIDA_CONFIG_FOLDER: typing.Optional[pathlib.Path] = None
DAEMON_DIR: typing.Optional[pathlib.Path] = None
DAEMON_LOG_DIR: typing.Optional[pathlib.Path] = None
ACCESS_CONTROL_DIR: typing.Optional[pathlib.Path] = None


def create_instance_directories():
Expand All @@ -41,11 +43,19 @@ def create_instance_directories():
directory_base = pathlib.Path(AIIDA_CONFIG_FOLDER).expanduser()
directory_daemon = directory_base / DAEMON_DIR
directory_daemon_log = directory_base / DAEMON_LOG_DIR
directory_access = directory_base / ACCESS_CONTROL_DIR

list_of_paths = [
directory_base,
directory_daemon,
directory_daemon_log,
directory_access,
]

umask = os.umask(DEFAULT_UMASK)

try:
for path in [directory_base, directory_daemon, directory_daemon_log]:
for path in list_of_paths:

if path is directory_base and not path.exists():
warnings.warn(f'Creating AiiDA configuration folder `{path}`.')
Expand Down Expand Up @@ -75,6 +85,7 @@ def set_configuration_directory():
global AIIDA_CONFIG_FOLDER
global DAEMON_DIR
global DAEMON_LOG_DIR
global ACCESS_CONTROL_DIR

environment_variable = os.environ.get(DEFAULT_AIIDA_PATH_VARIABLE, None)

Expand All @@ -100,6 +111,7 @@ def set_configuration_directory():

DAEMON_DIR = AIIDA_CONFIG_FOLDER / DEFAULT_DAEMON_DIR_NAME
DAEMON_LOG_DIR = DAEMON_DIR / DEFAULT_DAEMON_LOG_DIR_NAME
ACCESS_CONTROL_DIR = AIIDA_CONFIG_FOLDER / DEFAULT_ACCESS_CONTROL_DIR_NAME

create_instance_directories()

Expand Down
3 changes: 3 additions & 0 deletions aiida/manage/manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,7 @@ def _load_backend(self, schema_check: bool = True, repository_check: bool = True
from aiida.common import ConfigurationError, InvalidOperation
from aiida.common.log import configure_logging
from aiida.manage import configuration
from aiida.manage.profile_access import ProfileAccessManager

profile = self.get_profile()

Expand All @@ -129,6 +130,8 @@ def _load_backend(self, schema_check: bool = True, repository_check: bool = True

# Do NOT reload the backend environment if already loaded, simply reload the backend instance after
if configuration.BACKEND_UUID is None:
access_manager = ProfileAccessManager(profile)
access_manager.request_access()
backend_manager.load_backend_environment(profile, validate_schema=schema_check)
configuration.BACKEND_UUID = profile.uuid

Expand Down
222 changes: 222 additions & 0 deletions aiida/manage/profile_access.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,222 @@
# -*- coding: utf-8 -*-
###########################################################################
# Copyright (c), The AiiDA team. All rights reserved. #
# This file is part of the AiiDA code. #
# #
# The code is hosted on GitHub at https://github.com/aiidateam/aiida-core #
# For further information on the license, see the LICENSE.txt file #
# For further information please visit http://www.aiida.net #
###########################################################################
"""Module for the ProfileAccessManager that tracks process access to the profile."""
import contextlib
import os
from pathlib import Path
import typing

import psutil

from aiida.common.exceptions import LockedProfileError, LockingProfileError
from aiida.common.lang import type_check
from aiida.manage.configuration import Profile


class ProfileAccessManager:
"""Class to manage access to a profile.
Any process that wants to request access for a given profile, should first call:
ProfileAccessManager(profile).request_access()
If this returns normally, the profile can be used safely. It will raise if it is locked, in which case the profile
should not be used. If a process wants to request exclusive access to the profile, it should use ``lock``:
with ProfileAccessManager(profile).lock():
pass
If the profile is already locked or is currently in use, an exception is raised.
Access and locks of the profile will be recorded in a directory with files with a ``.pid`` and ``.lock`` extension,
respectively. In principle then, at any one time, there can either be a number of pid files, or just a single lock
file. If there is a mixture or there are more than one lock files, we are in an inconsistent state.
"""

def __init__(self, profile: Profile):
"""Class that manages access and locks to the given profile.
:param profile: the profile whose access to manage.
"""
from aiida.manage.configuration.settings import ACCESS_CONTROL_DIR

type_check(profile, Profile)
self.profile = profile
self.process = psutil.Process(os.getpid())
self._dirpath_records = ACCESS_CONTROL_DIR / profile.name
self._dirpath_records.mkdir(exist_ok=True)

def request_access(self) -> None:
"""Request access to the profile.
If the profile is locked, a ``LockedProfileError`` is raised. Otherwise a PID file is created for this process
and the function returns ``None``. The PID file contains the command of the process.
:raises ~aiida.common.exceptions.LockedProfileError: if the profile is locked.
"""
error_message = (
f'process {self.process.pid} cannot access profile `{self.profile.name}`'
f'because it is being locked.'
)
self._raise_if_locked(error_message)

filepath_pid = self._dirpath_records / f'{self.process.pid}.pid'
filepath_tmp = self._dirpath_records / f'{self.process.pid}.tmp'

try:
# Write the content to a temporary file and then move it into place with an atomic operation.
# This prevents the situation where another process requests a lock while this file is being
# written: if that was to happen, when the locking process is checking for outdated records
# it will read the incomplete command, won't be able to correctly compare it with its running
# process, and will conclude the record is old and clean it up.
filepath_tmp.write_text(str(self.process.cmdline()))
os.rename(filepath_tmp, filepath_pid)

# Check again in case a lock was created in the time between the first check and creating the
# access record file.
error_message = (
f'profile `{self.profile.name}` was locked while process '
f'{self.process.pid} was requesting access.'
)
self._raise_if_locked(error_message)

except Exception as exc:
filepath_tmp.unlink(missing_ok=True)
filepath_pid.unlink(missing_ok=True)
raise exc

@contextlib.contextmanager
def lock(self):
"""Request a lock on the profile for exclusive access.
This context manager should be used if exclusive access to the profile is required. Access will be granted if
the profile is currently not in use, nor locked by another process. During the context, the profile will be
locked, which will be lifted automatically as soon as the context exits.
:raises ~aiida.common.exceptions.LockingProfileError: if there are currently active processes using the profile.
:raises ~aiida.common.exceptions.LockedProfileError: if there currently already is a lock on the profile.
"""
error_message = (
f'process {self.process.pid} cannot lock profile `{self.profile.name}` '
f'because it is already locked.'
)
self._raise_if_locked(error_message)

self._clear_stale_pid_files()

error_message = (
f'process {self.process.pid} cannot lock profile `{self.profile.name}` '
f'because it is being accessed.'
)
self._raise_if_active(error_message)

filepath = self._dirpath_records / f'{self.process.pid}.lock'
filepath.touch()

try:
# Check if no other lock files were created in the meantime, which is possible if another
# process was trying to obtain a lock at almost the same time.
# By re-checking after creating the lock file we can ensure that racing conditions will never
# cause two different processes to both think that they acquired the lock. It is still possible
# that two processes that are trying to lock will think that the other acquired the lock first
# and then both will fail, but this is a much safer case.
error_message = (
f'while process {self.process.pid} attempted to lock profile `{self.profile.name}`, '
f'other process blocked it first.'
)
self._raise_if_locked(error_message)

error_message = (
f'while process {self.process.pid} attempted to lock profile `{self.profile.name}`, '
f'other process started using it.'
)
self._raise_if_active(error_message)

yield

finally:
filepath.unlink(missing_ok=True)

def is_locked(self) -> bool:
"""Return whether the profile is locked."""
return self._get_tracking_files('.lock', exclude_self=False) != []

def is_active(self) -> bool:
"""Return whether the profile is currently in use."""
return self._get_tracking_files('.pid', exclude_self=False) != []

def clear_locks(self) -> None:
"""Clear all locks on this profile.
.. warning:: This should only be used if the profile is currently still incorrectly locked because the lock was
not automatically released after the ``lock`` contextmanager exited its scope.
"""
for lock_file in self._get_tracking_files('.lock'):
lock_file.unlink()

def _clear_stale_pid_files(self) -> None:
"""Clear any stale PID files."""
for path in self._get_tracking_files('.pid'):
try:
process = psutil.Process(int(path.stem))
except psutil.NoSuchProcess:
# The process no longer exists, so simply remove the PID file.
path.unlink()
else:
# If the process exists but its command is different from what is written in the PID file,
# we assume the latter is stale and remove it.
if path.read_text() != str(process.cmdline()):
path.unlink()

def _get_tracking_files(self, ext_string: str, exclude_self: bool = False) -> typing.List[Path]:
"""Return a list of all files that track the accessing and locking of the profile.
:param ext_string:
To get the files that track locking use `.lock`, to get the files that track access use `.pid`.
:param exclude_self:
If true removes from the returned list any tracking to the current process.
"""
path_iterator = self._dirpath_records.glob('*' + ext_string)

if exclude_self:
filepath_self = self._dirpath_records / (str(self.process.pid) + ext_string)
list_of_files = [filepath for filepath in path_iterator if filepath != filepath_self]

else:
list_of_files = list(path_iterator)

return list_of_files

def _raise_if_locked(self, message_start):
"""Raise a ``LockedProfileError`` if the profile is locked.
:param message_start: Text to use as the start of the exception message.
:raises ~aiida.common.exceptions.LockedProfileError: if the profile is locked.
"""
list_of_files = self._get_tracking_files('.lock', exclude_self=True)

if len(list_of_files) > 0:
error_msg = message_start + '\nThe following processes are blocking the profile:\n'
error_msg += '\n'.join(f' - pid {path.stem}' for path in list_of_files)
raise LockedProfileError(error_msg)

def _raise_if_active(self, message_start):
"""Raise a ``LockingProfileError`` if the profile is being accessed.
:param message_start: Text to use as the start of the exception message.
:raises ~aiida.common.exceptions.LockingProfileError: if the profile is active.
"""
list_of_files = self._get_tracking_files('.pid', exclude_self=True)

if len(list_of_files) > 0:
error_msg = message_start + '\nThe following processes are accessing the profile:\n'
error_msg += '\n'.join(f' - pid {path.stem} (`{path.read_text()}`)' for path in list_of_files)
raise LockingProfileError(error_msg)
9 changes: 9 additions & 0 deletions tests/backends/managers/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# -*- coding: utf-8 -*-
###########################################################################
# Copyright (c), The AiiDA team. All rights reserved. #
# This file is part of the AiiDA code. #
# #
# The code is hosted on GitHub at https://github.com/aiidateam/aiida-core #
# For further information on the license, see the LICENSE.txt file #
# For further information please visit http://www.aiida.net #
###########################################################################
Loading

0 comments on commit 4cf9d93

Please sign in to comment.