-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NEW: Mechanism to lock profile access within AiiDA (v2) #5270
Changes from 14 commits
7e088a9
27bd603
3c73bec
1f64f80
5e281af
5d28088
d723cdf
440b323
9a7b589
70b1674
64b6c3d
e246708
0259d6d
72152e4
7e2c43b
5756e6d
d51165d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
# -*- coding: utf-8 -*- | ||
########################################################################### | ||
# Copyright (c), The AiiDA team. All rights reserved. # | ||
# This file is part of the AiiDA code. # | ||
# # | ||
# The code is hosted on GitHub at https://github.com/aiidateam/aiida-core # | ||
# For further information on the license, see the LICENSE.txt file # | ||
# For further information please visit http://www.aiida.net # | ||
########################################################################### |
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,222 @@ | ||||||
# -*- coding: utf-8 -*- | ||||||
########################################################################### | ||||||
# Copyright (c), The AiiDA team. All rights reserved. # | ||||||
# This file is part of the AiiDA code. # | ||||||
# # | ||||||
# The code is hosted on GitHub at https://github.com/aiidateam/aiida-core # | ||||||
# For further information on the license, see the LICENSE.txt file # | ||||||
# For further information please visit http://www.aiida.net # | ||||||
########################################################################### | ||||||
"""Module for the ProfileAccessManager that tracks process access to the profile.""" | ||||||
import contextlib | ||||||
import os | ||||||
from pathlib import Path | ||||||
import typing | ||||||
|
||||||
import psutil | ||||||
|
||||||
from aiida.common.exceptions import LockedProfileError, LockingProfileError | ||||||
from aiida.common.lang import type_check | ||||||
from aiida.manage.configuration import Profile | ||||||
|
||||||
|
||||||
class ProfileAccessManager: | ||||||
"""Class to manage access to a profile. | ||||||
|
||||||
Any process that wants to request access for a given profile, should first call: | ||||||
|
||||||
ProfileAccessManager(profile).request_access() | ||||||
|
||||||
If this returns normally, the profile can be used safely. It will raise if it is locked, in which case the profile | ||||||
should not be used. If a process wants to request exclusive access to the profile, it should use ``lock``: | ||||||
|
||||||
with ProfileAccessManager(profile).lock(): | ||||||
pass | ||||||
|
||||||
If the profile is already locked or is currently in use, an exception is raised. | ||||||
|
||||||
Access and locks of the profile will be recorded in a directory with files with a ``.pid`` and ``.lock`` extension, | ||||||
respectively. In principle then, at any one time, there can either be a number of pid files, or just a single lock | ||||||
file. If there is a mixture or there are more than one lock files, we are in an inconsistent state. | ||||||
""" | ||||||
|
||||||
def __init__(self, profile: Profile): | ||||||
"""Class that manages access and locks to the given profile. | ||||||
|
||||||
:param profile: the profile whose access to manage. | ||||||
""" | ||||||
from aiida.manage.configuration.settings import ACCESS_CONTROL_DIR | ||||||
|
||||||
type_check(profile, Profile) | ||||||
self.profile = profile | ||||||
self.process = psutil.Process(os.getpid()) | ||||||
self._dirpath_records = ACCESS_CONTROL_DIR / profile.name | ||||||
self._dirpath_records.mkdir(exist_ok=True) | ||||||
|
||||||
def request_access(self) -> None: | ||||||
"""Request access to the profile. | ||||||
|
||||||
If the profile is locked, a ``LockedProfileError`` is raised. Otherwise a PID file is created for this process | ||||||
and the function returns ``None``. The PID file contains the command of the process. | ||||||
|
||||||
:raises ~aiida.common.exceptions.LockedProfileError: if the profile is locked. | ||||||
""" | ||||||
error_message = ( | ||||||
f'process {self.process.pid} cannot access profile `{self.profile.name}`' | ||||||
f'because it is being locked.' | ||||||
) | ||||||
self._raise_if_locked(error_message) | ||||||
|
||||||
filepath_pid = self._dirpath_records / f'{self.process.pid}.pid' | ||||||
filepath_tmp = self._dirpath_records / f'{self.process.pid}.tmp' | ||||||
|
||||||
try: | ||||||
# Write the content to a temporary file and then move it into place with an atomic operation. | ||||||
# This prevents the situation where another process requests a lock while this file is being | ||||||
# written: if that was to happen, when the locking process is checking for outdated records | ||||||
# it will read the incomplete command, won't be able to correctly compare it with its running | ||||||
# process, and will conclude the record is old and clean it up. | ||||||
filepath_tmp.write_text(str(self.process.cmdline())) | ||||||
os.rename(filepath_tmp, filepath_pid) | ||||||
|
||||||
# Check again in case a lock was created in the time between the first check and creating the | ||||||
# access record file. | ||||||
error_message = ( | ||||||
f'profile `{self.profile.name}` was locked while process ' | ||||||
f'{self.process.pid} was requesting access.' | ||||||
) | ||||||
self._raise_if_locked(error_message) | ||||||
|
||||||
except Exception as exc: | ||||||
filepath_tmp.unlink(missing_ok=True) | ||||||
filepath_pid.unlink(missing_ok=True) | ||||||
raise exc | ||||||
|
||||||
@contextlib.contextmanager | ||||||
def lock(self): | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The implementation of the locking logic here is the only concern I have with this PR. A platform independent, well-tested in practice, and overall well-maintained implementation of this logic can be found in: https://github.com/tox-dev/py-filelock/tree/main/src/filelock . The only thing the library does not do is unlinking the lock file after releasing it, but I would argue that we can either adapt to that behavior or simply wrap the context manager and perform that step ourselves. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The thing is that at this point I think we would be introducing a new library not only at normal cost (i.e. dependency maintenance), but also with the extra of having to coordinate it awkwardly for our use case (which adds another point of possible break), and with little to nothing to gain:
I don't know, maybe I'm underestimating the possible complications of locking with files, but I don't feel this is a good tradeoff. @sphuber you are quite familiar with the code by now, do you feel like we could gain something from trying to fit in the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see both points. Normally, for such an important thing, I would also be tempted to use the external package, but I think you expressed the counter-arguments pretty well. We could indeed use it for our own There would still be a robustness advantage in replacing our own locking implementation with the TLDR: I think it would be preferable to use the package, but I think this is not possible due to our requirements of having non-exclusive locks and wanting to notify a user which processes are accessing it or locking it if a lock is denied. Only if we are willing to give up this user functionality, can we simplify the implementation and use the package. So in the end, this is the decision we have to take I believe. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would suggest move forward with the current implementation, ensure that is extremely well tested (that's one of those things that should have 100% coverage IMO), and then maybe we can spend 1-2hrs trying to refactor it with the libraries that I have mentioned. If it is too complicated, then we just stick with the current approach until we run into problems. |
||||||
"""Request a lock on the profile for exclusive access. | ||||||
|
||||||
This context manager should be used if exclusive access to the profile is required. Access will be granted if | ||||||
the profile is currently not in use, nor locked by another process. During the context, the profile will be | ||||||
locked, which will be lifted automatically as soon as the context exits. | ||||||
|
||||||
:raises ~aiida.common.exceptions.LockingProfileError: if there are currently active processes using the profile. | ||||||
:raises ~aiida.common.exceptions.LockedProfileError: if there currently already is a lock on the profile. | ||||||
""" | ||||||
error_message = ( | ||||||
f'process {self.process.pid} cannot lock profile `{self.profile.name}` ' | ||||||
f'because it is already locked.' | ||||||
) | ||||||
self._raise_if_locked(error_message) | ||||||
|
||||||
self._clear_stale_pid_files() | ||||||
|
||||||
error_message = ( | ||||||
f'process {self.process.pid} cannot lock profile `{self.profile.name}` ' | ||||||
f'because it is being accessed.' | ||||||
) | ||||||
self._raise_if_active(error_message) | ||||||
|
||||||
filepath = self._dirpath_records / f'{self.process.pid}.lock' | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why does the exclusive lock file to be PID specific? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we record the Moreover, having separate locking files gives an easy way to prevent racing conditions: if by the end of the process of acquiring the lock there is more than one lock file, something "went wrong" (more than one process were simultaneously trying to acquire the lock) and we remove at least the one that correspond to the process checking. This makes it impossible for two processes to each think they acquired a lock for exclusive access. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't follow this argument, isn't the whole point of using a single exclusive lock file to prevent race conditions? If it possible that two processes can obtain the same lock at the same time, then the file system does not support locking. If you need to have the PID, then just obtain the lock and then write the PID file into a distinct PID file. To my understanding, the file locking procedure should be an atomic operation, whereas checking for files, creating a file, and then checking for file existence again is not. There is also https://pypi.org/project/pid/ which might support that exact use case. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
@csadorf is right here. The whole point of a lock file is to prevent race conditions, so the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am a bit lost in this thread now. I don't know if maybe a "lock file" is a more technical term with very specific details of how it is implemented; I'm using files to track access to the AiiDA profile and provide a context in which I guarantee no other process was accessing it. My design indeed can't get completely rid of the racing condition, but it exchanges the situation of two competing processes both thinking they acquired the lock for a situation in which they both think the other one beat them to it and except. I think this should be good enough for our use case since it effectively prevents any corruption of the data and the user can just try again (I can't imagine a case where there are multiple processes trying to lock the profile by design). Using
Mmm maybe? The "stale detection" line sounds interesting, but honestly I can't even understand what exactly does this do from the documentation they have (which seems to be reduced to that small readme). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just to be clear, I am also not trying to antagonize here. I am just trying to help by trying to see if the suggestion of @chrisjsewell and @csadorf could work. The best way to do this and to show that we are taking it seriously, is to actually try it and propose an alternate implementation. Making it concrete is the best way to visualize any problems and could help us finalize the decision with the design. Otherwise, if we stick with just discussions we never move forward. But I think we have done enough now and I think we should just stick with this design then. Unless the others really think that having the additional information of which PIDs are blocking is not worth the additional complexity of the design and forcing a custom implementation. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have already suggested to move forward with the current implementation with extremely high test coverage and then see if we can recover the exact same behavior using the available libraries. I would personally be extremely surprised if that was not possible. I am a bit disappointed that my suggestion to adopt more concise semantics was ignored. @ramirezfranciscof You are using the term lock file, but are not technically locking anything and also state that you are merely tracking "access". At the same time, these access files are used to simulate exclusive and non-exclusive locking behavior. What I am seeing here is the need for
All of these use cases could certainly be implemented with aforementioned libraries. It is just open whether the implementation would be more or less complex. What appears obvious to me is that at least the exclusive locking would be more robust. Please move forward with this PR any way you see fit. I believe that especially when it comes to sensitive operations such as file locking, well-tested and platform independent libraries should be considered which is why I suggested the specific library that I have had good experience with. I am starting to get somewhat frustrated needing to defend that suggestion over and over again. In the end it does not matter to me whether we use the library or not, what I care about is that things work and work well. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This is what you keep saying, but then when we describe potential reasons why this is not possible, we don't get a specific response whether you agree with that analysis. I tried to fix this by making it more concrete, going out of my way to provide a potential implementation on a PR that is not mine, but still we get a repetition of platitudes that it should be possible.
Well that makes two of us. I am not asking you to defend it, I am asking you to look at our analysis of trying to implement your suggestion and the limitations that we think are there, and either provide a clear solution to those limitations or accept that they are there and so sign off on the design. Again, I am doing this in order to not discard your suggestion without proper consideration, out of respect to you and the time you took to review the concept of this PR, not to cause you grief. Just reiterating that "you think it should be possible with the library" is not helping. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I do not understand what you are expecting from me here. Do I have to implement it myself? I think I provided a clear analysis on what I interpret we are trying to implement here and made suggestion on how to improve the language and abstraction as well as provided guidance as to how I would implement it. When I inquired about specific obstacles the presented arguments were not convincing.
Again, not quite sure what is expected here despite providing an actual implementation. You have already received my approval to go ahead multiple times. I apologize for not providing a constructive review and will withdraw from the discussion. I am very sorry that I wasted everybody's time with my suggestion. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I am sorry if anything I said gave the impression that this was my stance, it is not at all. I really appreciate you taking the time to give feedback on this, and I truly think it was very valuable even if in the end I might opt or push to not apply all of it. I assure I did not just dismissed it: I really tried looking into the I also tried my best to start using the proposed terminology throughout the discussion, is just that it is a bit hard for me to immediately understand how exactly the terms are intended and what is the extent of their influence (e.g. is "exclusive access" a qualifier when talking about locking? does it completely replace "locking"? can I still talk about lock files or how do I refer to the files I use for tracking this now?). Again, this is not to say it was on you to clarify all this right away, I'm just trying to explain why I may not have seem to fully adopt these semantics right away. Lastly I hope my comments on the I think @giovannipizzi may be right, this has already eroded all of us a bit and it might be better to just go ahead and merge it. If at some point someone wishes to revisit incorporating any of these libraries with some fresh eyes or improving the feature in any other way, I would be more than happy to help. |
||||||
filepath.touch() | ||||||
|
||||||
try: | ||||||
# Check if no other lock files were created in the meantime, which is possible if another | ||||||
# process was trying to obtain a lock at almost the same time. | ||||||
# By re-checking after creating the lock file we can ensure that racing conditions will never | ||||||
# cause two different processes to both think that they acquired the lock. It is still possible | ||||||
# that two processes that are trying to lock will think that the other acquired the lock first | ||||||
# and then both will fail, but this is a much safer case. | ||||||
error_message = ( | ||||||
f'while process {self.process.pid} attempted to lock profile `{self.profile.name}`, ' | ||||||
f'other process blocked it first.' | ||||||
) | ||||||
self._raise_if_locked(error_message) | ||||||
|
||||||
error_message = ( | ||||||
f'while process {self.process.pid} attempted to lock profile `{self.profile.name}`, ' | ||||||
f'other process started using it.' | ||||||
) | ||||||
self._raise_if_active(error_message) | ||||||
|
||||||
yield | ||||||
|
||||||
finally: | ||||||
filepath.unlink(missing_ok=True) | ||||||
|
||||||
def is_locked(self) -> bool: | ||||||
"""Return whether the profile is locked.""" | ||||||
return self._get_tracking_files('.lock', exclude_self=False) != [] | ||||||
|
||||||
def is_active(self) -> bool: | ||||||
"""Return whether the profile is currently in use.""" | ||||||
return self._get_tracking_files('.pid', exclude_self=False) != [] | ||||||
|
||||||
def clear_locks(self) -> None: | ||||||
"""Clear all locks on this profile. | ||||||
|
||||||
.. warning:: This should only be used if the profile is currently still incorrectly locked because the lock was | ||||||
not automatically released after the ``lock`` contextmanager exited its scope. | ||||||
""" | ||||||
for lock_file in self._get_tracking_files('.lock'): | ||||||
lock_file.unlink() | ||||||
|
||||||
def _clear_stale_pid_files(self) -> None: | ||||||
"""Clear any stale PID files.""" | ||||||
for path in self._get_tracking_files('.pid'): | ||||||
try: | ||||||
process = psutil.Process(int(path.stem)) | ||||||
except psutil.NoSuchProcess: | ||||||
# The process no longer exists, so simply remove the PID file. | ||||||
path.unlink() | ||||||
else: | ||||||
# If the process exists but its command is different from what is written in the PID file, | ||||||
# we assume the latter is stale and remove it. | ||||||
if path.read_text() != str(process.cmdline()): | ||||||
path.unlink() | ||||||
|
||||||
def _get_tracking_files(self, ext_string: str, exclude_self: bool = False) -> typing.List[Path]: | ||||||
"""Return a list of all files that track the accessing and locking of the profile. | ||||||
|
||||||
:param ext_string: | ||||||
To get the files that track locking use `.lock`, to get the files that track access use `.pid`. | ||||||
|
||||||
:param exclude_self: | ||||||
If true removes from the returned list any tracking to the current process. | ||||||
""" | ||||||
path_iterator = self._dirpath_records.glob('*' + ext_string) | ||||||
|
||||||
if exclude_self: | ||||||
filepath_self = self._dirpath_records / (str(self.process.pid) + ext_string) | ||||||
list_of_files = [filepath for filepath in path_iterator if filepath != filepath_self] | ||||||
|
||||||
else: | ||||||
list_of_files = list(path_iterator) | ||||||
|
||||||
return list_of_files | ||||||
|
||||||
def _raise_if_locked(self, message_start): | ||||||
"""This method will raise the exception given in `ExceptionClass` if the profile is locked. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed. |
||||||
|
||||||
:param message_start: Text to use as the start of the exception message. | ||||||
:raises ~aiida.common.exceptions.LockedProfileError: if the profile is locked. | ||||||
""" | ||||||
list_of_files = self._get_tracking_files('.lock', exclude_self=True) | ||||||
|
||||||
if len(list_of_files) > 0: | ||||||
error_msg = message_start + '\nThe following processes are blocking the profile:\n' | ||||||
error_msg += '\n'.join(f' - pid {path.stem}' for path in list_of_files) | ||||||
raise LockedProfileError(error_msg) | ||||||
|
||||||
def _raise_if_active(self, message_start): | ||||||
"""This method will raise the exception given in `ExceptionClass` if the profile is being accessed. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed. |
||||||
|
||||||
:param message_start: Text to use as the start of the exception message. | ||||||
:raises ~aiida.common.exceptions.LockingProfileError: if the profile is active. | ||||||
""" | ||||||
list_of_files = self._get_tracking_files('.pid', exclude_self=True) | ||||||
|
||||||
if len(list_of_files) > 0: | ||||||
error_msg = message_start + '\nThe following processes are accessing the profile:\n' | ||||||
error_msg += '\n'.join(f' - pid {path.stem} (`{path.read_text()}`)' for path in list_of_files) | ||||||
raise LockingProfileError(error_msg) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
# -*- coding: utf-8 -*- | ||
########################################################################### | ||
# Copyright (c), The AiiDA team. All rights reserved. # | ||
# This file is part of the AiiDA code. # | ||
# # | ||
# The code is hosted on GitHub at https://github.com/aiidateam/aiida-core # | ||
# For further information on the license, see the LICENSE.txt file # | ||
# For further information please visit http://www.aiida.net # | ||
########################################################################### |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this file also not just belong in
aiida.manage.configuration
? It is there where config and profile loading/access is organized and where this is also being called. I think @chrisjsewell was getting rid of this entire module soon anyway.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved it to
aiida.manage.profile_access
, it seemed more fitting thatconfiguration
, but let me know if you disagree.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me.
Random note that, as it does pertain to this line, but https://www.python.org/dev/peps/pep-3120/ makes the default encoding for python source UTF-8, so these
# -*- coding: utf-8 -*-
lines are kinda a legacy thing