Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NEW: Mechanism to lock profile access within AiiDA (v2) #5270

Merged
merged 17 commits into from
Jan 21, 2022

Conversation

ramirezfranciscof
Copy link
Member

From now on aiida will keep track of all processes that request access
to the profile by saving their PIDs inside:

$AIIDA_PATH/access/profile_name/tracked/<process_id>.pid

Before returning control to the client, it will also check that there is
no files of the form:

$AIIDA_PATH/access/profile_name/locking/<process_id>.pid

As this would indicate that the profile is being locked by a process that
requires exclusive access for safety of its operations.

For a process to request such access it will first have to check that there
are no active processes using the profile, so it will look at all files in
the tracked folder and compare those to the currently running processes
in the system to check that none is actually active (it will also delete
those outdated tracking files in the process).

The design is as follows:

  • A ProcessData class was defined to store the information relevant to the
    processes.

    • It can be initialized either with a PID (of a process to be looked
      among those currently running in the system) or with a filepath
      (where the data of a previous process was stored, typically either
      in the locking or the tracked folders). If none of these are
      provided, it will load the info of the currently running process
      that is calling the execution.

    • The class also has a method to write the information to a given
      filepath (typically in the locking or the tracked folders).

  • An AccessManager class to control the access to a profile.

    • It can be initialized with a profile to which the class will control
      the access to (by default it loads the one currenly in use).

    • It has a couple of internal methods to distribute and modularize the
      different tasks, but the most important externally are:

      • profile_locking_context: a context that can be called to work
        with the profile locked. It will raise LockedProfileError if the
        profile is already locked or LockingProfileError if the profile
        is being accessed by other processes.

      • record_process_access: a method to record the accessing to the
        profile. It is now being called in load_backend_environment
        to make sure every process that loads the backend gets recorded.
        It will raise LockedProfileError if the profile is already
        locked.

From now on aiida will keep track of all processes that request access
to the profile by saving their PIDs inside:

$AIIDA_PATH/access/profile_name/tracked/<process_id>.pid

Before returning control to the client, it will also check that there is
no files of the form:

$AIIDA_PATH/access/profile_name/locking/<process_id>.pid

As this would indicate that the profile is being locked by a process that
requires exclusive access for safety of its operations.

For a process to request such access it will first have to check that there
are no active processes using the profile, so it will look at all files in
the `tracked` folder and compare those to the currently running processes
in the system to check that none is actually active (it will also delete
those outdated tracking files in the process).

The design is as follows:

 - A ProcessData class was defined to store the information relevant to the
   processes.

     - It can be initialized either with a PID (of a process to be looked
       among those currently running in the system) or with a filepath
       (where the data of a previous process was stored, typically either
       in the `locking` or the `tracked` folders). If none of these are
       provided, it will load the info of the currently running process
       that is calling the execution.

     - The class also has a method to write the information to a given
       filepath (typically in the `locking` or the `tracked` folders).

 - An AccessManager class to control the access to a profile.

     - It can be initialized with a profile to which the class will control
       the access to (by default it loads the one currenly in use).

     - It has a couple of internal methods to distribute and modularize the
       different tasks, but the most important externally are:

         - profile_locking_context: a context that can be called to work
           with the profile locked. It will raise LockedProfileError if the
           profile is already locked or LockingProfileError if the profile
           is being accessed by other processes.

         - record_process_access: a method to record the accessing to the
           profile. It is now being called in `load_backend_environment`
           to make sure every process that loads the backend gets recorded.
           It will raise LockedProfileError if the profile is already
           locked.
@ramirezfranciscof
Copy link
Member Author

@giovannipizzi I tried to give the best explanation I could in the description of the PR, you can leave your feedback here or let me know if you want to discuss via zoom.

This is the base skeleton prototype: I tried to make it as robust and structurally organized as the final one would be, but it still needs things like all adding all the tests and improving on the error messages and that kind of polish that I can do after you tell me if this would be ok.

@sphuber
Copy link
Contributor

sphuber commented Dec 19, 2021

Thanks @ramirezfranciscof for the prototype. I have given it a first read and since this is a draft, before doing a standard review going line by line, a more high-level discussion on interface and implementation would be better to start with.

I realize that some of the code that you added is not yet being used, but is something that would potentially be used in the future. For example the whole code of acquiring a lock. However, I have the feeling that there is quite a lot of code and methods that are not really necessary and are making it a bit difficult to understand how the class works and should be used. Also I am not sure if the ProcessData class is really necessary. The additional data you are storing (command and ctime) are not really used as far as I can tell, nor do I see why they should be. You use it in the equivalency, but I don't think that should matter, only the pid does. It would be a lot simpler to just use the object returned by psutil which has a pid property.

Taking your concept of the mechanism, I think the following should be the public API of the class:

class ProfileAccessManager:

    def __init__(self, profile):
        """Class that manages access and locks to the given profile.

        :param profile: the profile whose access to manage.
        """ 

    @contextlib.contextmanager
    def lock(self):
        """Request a lock on the profile for exclusive access.

        This context manager should be used if exclusive access to the profile is required. Access will be granted if
        the profile is currently not in use, nor locked by another process. During the context, the profile will be
        locked, which will be lifted automatically as soon as the context exits.

        :raises LockingProfileError: if there are currently active processes using the profile.
        :raises LockedProfileError: if there currently already is a lock on the profile.
        """

    def clear_locks(self):
        """Clear all locks on this profile.

        .. warning:: This should only be used if the profile is currently still incorrectly locked because the lock was
            not automatically released after the ``lock`` contextmanager exited its scope.

        """ 

    def request_access(self):
        """Request access to the profile.

        :raises LockedProfileError: if the profile is locked.
        """

All other methods should probably be protected, as they should not have to be called from the outside.

Now I am not saying that you should take this exactly, but I wanted to use this in order to raise some points in comparison to your design:

  1. Since the AccessManager operates on a profile, I would make that explicit by putting it in the name (ProfileAccessManager) and make the profile a required argument. Since this is sensitive code, it is dangerous if the wrong profile is used by accident so relying on the default seems undesirable
  2. The public interface should be as minimal as possible: lock to get a lock in context manager that is automatically released, clear_locks to unlock the profile if lock didn't cleanup properly after itself and request_access to request access in a non-locking way.
  3. Since there should only ever be one lock file, I don't see the point of having a separate folder. Just store that in the same folder as the access pids with something like f{pid}.lock}' so you know which process locked it.
  4. The method you apply to prevent racing-conditions in acquire_lock is not really correct. Simply checking twice, once before creating the lock and one after is not the correct way of "catching" a race condition. You should simply execute as much of the code of getting a lock without actually doing it. Then you do a quick check that should be as fast as possible, and then acquire the lock (which should also be as fast as possible since all preparation was performed before the check).

Let me know what you think. I would try to simplify the code as much as possible and remove anything that is not strictly necessary.

@ramirezfranciscof
Copy link
Member Author

ramirezfranciscof commented Dec 20, 2021

Hey @sphuber , thanks for the feedback.

Also I am not sure if the ProcessData class is really necessary. The additional data you are storing (command and ctime) are not really used as far as I can tell, nor do I see why they should be. You use it in the equivalency, but I don't think that should matter, only the pid does. It would be a lot simpler to just use the object returned by psutil which has a pid property.

The purpose of the ProcessData class is to abstract away all the details of handling the information related to the process from the operative data of keeping track of the access. I found that is much cleaner to have it this way rather than putting the whole logic of how to read, store, write, compare etc. information all spread in the Manager. It seems like a rather clear modularizable section and division of concerns; maybe the particular implementation still can have some rough edges (for which more specific feedback is welcome) but I don't understand what would be the conceptual problem with this.

Re-the internals: pid alone is not sufficient because it is possible that a systems will start re-utilizing pids after some time, and so the running process may not be the same as the one that recorded a request of access, despite having the same pid. Originally I was using the cdate to solve this possible ambiguity, but @giovannipizzi said that could also be problematic (not sure I fully agree or understand, but didn't find it super critical) and he suggested to use the command instead (but to keep the cdate for reference to give as info to the user when reporting errors). This is why the equality uses those two and we still store the cdate, which would be output when I polish the error messages.


  1. Since the AccessManager operates on a profile, I would make that explicit by putting it in the name (ProfileAccessManager) and make the profile a required argument. Since this is sensitive code, it is dangerous if the wrong profile is used by accident so relying on the default seems undesirable

Yes, that is a good idea, I agree the ProfileAccessManager can be more suitable.

Although re the default: I am a bit baffled by this (and also, in a sense, of how you guys reacted at this in the last meeting). The access manager is not the sensitive or dangerous code, the worst it do just by itself is lock the access in a way it can solve and you might need to go and manually delete a file. The actual dangerous operation is the one performed by whatever process is trying to acquire a lock. For example, the maintenance, which can actually be called directly by users and loads a default backend. The locking is trying to add a protective layer to that.


  1. The public interface should be as minimal as possible: lock to get a lock in context manager that is automatically released, clear_locks to unlock the profile if lock didn't cleanup properly after itself and request_access to request access in a non-locking way.

I think we agree on this, that is basically what I meant in the commit message / OP:

 - It has a couple of internal methods to distribute and modularize the
   different tasks, but the most important externally are:

     - profile_locking_context: (...)

     - record_process_access: (...)

I don't have clear_locks yet, but the idea was to have that too, yes (as well as a way to call that directly from the CLI).


  1. Since there should only ever be one lock file, I don't see the point of having a separate folder. Just store that in the same folder as the access pids with something like f{pid}.lock}' so you know which process locked it.

Originally I did it with two folders because the number of access records that could accumulate might be high (which in principle only need to be cleaned when you want to acquire a lock) and we may want to keep the checks on the lock (which need to be done every time you load the profile) super quick. I personally think it is cleaner that way, but if you have any concrete practical reason why it would actually be better to have all in the same folder I wouldn't be against it. However, I should also say that when discussing this with @giovannipizzi he made significant emphasis in keeping these separated, so he might have stronger reasons.


  1. The method you apply to prevent racing-conditions in acquire_lock is not really correct. Simply checking twice, once before creating the lock and one after is not the correct way of "catching" a race condition. You should simply execute as much of the code of getting a lock without actually doing it. Then you do a quick check that should be as fast as possible, and then acquire the lock (which should also be as fast as possible since all preparation was performed before the check).

Mmm, this I don't think I agree with this. You are trying to minimize the chance of the race condition clashing, but the way I am doing it I am actually eliminating it (at least for the "dangerous" clash where both processes think they have a lock, see at the end).

First, I wouldn't describe my approach as "checking twice", since actually the first check is not really necessary but I do it to prevent unnecessary operations. The only relevant check is the last one: if I (1) first create the profile locking file and then (2) check if it is still the only one before returning control, there is no way for two processes to (think they managed to) lock the profile at the same time. The first process to reach (2) will have already created its own locking file, so even if that one moves on to the return while other processes are creating their own locking files, when those processes reach (2) they will see their own file and the one from the first process to finish, and thus raise.

The most "conflicting" scenario is when two processes both arrive at the after-check, both see each other's file, and then both delete the file and raise thinking another process acquire the lock while neither did. But this should be harmless: I'm basically exchanging the risk, however small, of two locking attempts to think they succeeded (which could lead to database corruption) for the risk of two locking attempts raising (which should be solved by running again or checking why you have two simultaneous processes competing so fiercely for locks).

Is there something wrong in this reasoning? Am I missing any case?

aiida/backends/managers/access.py Outdated Show resolved Hide resolved
for filename in os.listdir(basepath):
if not filename.startswith(self._temp_prefix):
filepath = basepath / filename
process_data = ProcessData(filepath=filepath)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the content is corrupt? The safest is to assume there is a lock anyway, I think? In this case ProcessData will raise and the exception needs to be dealt with here. Maybe it's better to assume that the PID is part of the filename and infer it from there anyway, and set to None the rest of the information that would have been returned (and print "UNKNOWN" if we need to list e.g. the command or crime of that process).

aiida/backends/managers/access.py Outdated Show resolved Hide resolved
aiida/backends/managers/access.py Outdated Show resolved Hide resolved
filepath = self._path_to_tracked / f'{pid}.pid'
try:
os.remove(filepath)
except OSError as exc:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check the error number to make sure it's a file not found; if e.g. you get a 'cannot delete/write protected' you should raise anyway? check errno see here using values from the corresponding package

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or more simply catch FileNotFoundError as you do before

process_obj = psutil.Process(pid)
process_cmd = process_obj.cmdline()
process_ctm = time.localtime(process_obj.create_time())
process_ctm = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(process_obj.create_time()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use the ISO format? (there's a method to get it directly)

self._pid = pid
process_obj = psutil.Process(pid)
process_cmd = process_obj.cmdline()
process_ctm = time.localtime(process_obj.create_time())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is useless as it's replaced by the next one? Also, I would call it with a more descriptive name (process_ctime)


def _read_from_file(self, filepath):
"""Reads the info of the process from a file (filepath)"""
with open(filepath, 'r') as json_file: # pylint: disable=unspecified-encoding
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deal with an appropriate exception to "corrupt" content

with open(filepath, 'w') as json_file: # pylint: disable=unspecified-encoding
json.dump(json_data, json_file)

def _read_from_file(self, filepath):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just move this code inside the __init__? You don't want that this is called after the __init__, I think

self._data = json_data['data']

def __eq__(self, other):
"""Define equivalency"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More extensive comment (also to clarify Sebastiaan's comment)

@giovannipizzi
Copy link
Member

giovannipizzi commented Dec 21, 2021

  • I agree with @sphuber that private methods should be made private (@ramirezfranciscof this simply means adding an underscore before the method name, e.g. get_processes_recorded_in -> _get_processes_recorded_in - unless they are used by another class, then they are not private), and on making the profile a required argument
  • I agree with @ramirezfranciscof that reducing the time spent when there is a risk of a race condition as suggested by @sphuber (while a good thing to do in general for usability) is not enough. I think the current approach is OK (but needs very clear documentation as comments, otherwise it's not at all obvious what's going on and others will have the same questions, and even risk to change the code in the future breaking it). Just to clarify - this should indeed prevent race conditions, at the risk of multiple locking processes locking each other, that's ok.

We'll also need some stress testing where we call the locking profile multiple times, we open many processes concurrently and at some point lock, etc.

Finally, regarding the folder: definitely the lock file(s) it should be in a different folder than the tracked ones.
However, the filename is not unique, but depends on the PID. So one still has to check all files in some folder to see if there is at least one PID (and that does not start with the temp prefix etc.). I think the current approach is OK; we can move it one level up but I don't think it's really needed.

[EDIT/ADD]: also, the idea of using both cmd and ctime came from preliminary discussions with Francisco. I suggested cmd because it should be an immutable string (I think?) so easy to compare (to check if the same PID was reused by the OS for a different process, e.g. after a reboot). ctime has all the issues of times and floats when comparing so I'd rather not use it (and only use it instead in warning messages to users, e.g. to say - there is a process using the profile started XX hours go). I suggest that Francisco also adds these utility functions/prints messages, so it's clear what cmd and ctime are used for, respectively.

process_datadicts = {}

for filename in os.listdir(basepath):
if not filename.startswith(self._temp_prefix):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would actually also check that the file ends with .pid (this should also be a class variable, reused everywhere and not hardcoded). This avoids that if there are other files these create problems (e.g. files ending in ~ if someone edited with and editor, files like .gitignore for some reason put there, ...)

@sphuber
Copy link
Contributor

sphuber commented Dec 21, 2021

Finally, regarding the folder: definitely the lock file(s) it should be in a different folder than the tracked ones.
However, the filename is not unique, but depends on the PID. So one still has to check all files in some folder to see if there is at least one PID (and that does not start with the temp prefix etc.). I think the current approach is OK; we can move it one level up but I don't think it's really needed.

Why does it absolutely have to be in different folders? Why can't you have {pid}.pid for the tracked processes and {pid}.lock for the lock files?

also, the idea of using both cmd and ctime came from preliminary discussions with Francisco. I suggested cmd because it should be an immutable string (I think?) so easy to compare (to check if the same PID was reused by the OS for a different process, e.g. after a reboot). ctime has all the issues of times and floats when comparing so I'd rather not use it (and only use it instead in warning messages to users, e.g. to say - there is a process using the profile started XX hours go). I suggest that Francisco also adds these utility functions/prints messages, so it's clear what cmd and ctime are used for, respectively.

Adding the command doesn't solve the problem though, you could start the same process with the same PID if the OS reassigns it. Sure, you might reduce the chances a bit, but how often do we have PID turn around anyway? Do other tools include additional information like this when working with PIDs? To me this seems like using SHA256 but also preemptively trying to protect against clashes. It is clear that it is a possibility, but you are not supposed to try and guard against it because it is super unlikely. Also regarding the ctime, why not just use the ctime of the file?

@giovannipizzi
Copy link
Member

giovannipizzi commented Dec 21, 2021

Why does it absolutely have to be in different folders? Why can't you have {pid}.pid for the tracked processes and {pid}.lock for the lock files?

I see; this would also work. However currently you have to loop over all files in that folder, that can be potentially many if users have been running hundreds of verdi shell/verdi run commands and didn't clean them yet (something that happens rarely for performance, e.g. only when you try to do a locking operation). It might add significant costs to any load_profile command, especially on low disks, so I think it's better to keep in a separate folder (unless you have strong reasons for which you believe having them in the same folder is better).

Adding the command doesn't solve the problem though, you could start the same process with the same PID if the OS reassigns it.

Indeed, I thought to this. But in that (very rare) case, the "same process" would be a verdi process! So it's actually OK that it's still considered to be there for our locking purposes.
Therefore I think the approach implemented here is quite robust and also essentially never gives false positives.

how often do we have PID turn around anyway?

I would say this is quite common. You start AiiDA as one of the first things after a reboot, you will end up using a low-value integer. You reboot the computer, you start some other software, those will get the same low-value integers so it's not so rare. If these are daemons or long-running processes, the user will have issues using AiiDA unless they do some non-obvious operations (remove all PID files); letting users do it might actually be dangerous because they will probably delete all of them, i.e. they might delete also those of verdi processes that are actually running.

Note: this is even more common than usual if e.g. AiiDA inside docker (e.g. for AiiDAlab) since processes always start back from 1 when you restart the container; there is a high probability that they exchange the order and overlap (even just running ls once will increase the PID counter).

The implementation takes care of all of this in a way that I see as robust (i.e., I still don't see a usecase in which you think this might either fail its intended purpose, i.e. allow a locking process to run even if there is another verdi process running), or give a false positive (tell the user to clean the PID files manually for a process that clearly is gone or is now something else).

[correction: I see an edge case: a user starts a verdi shell with the default profile (say, old, reboots, changes the default profile to new, start another verdi shell with the new default profile, and this ends up to get the very same PID; the command would still be something like /usr/local/Cellar/[email protected]/3.8.11/Frameworks/Python.framework/Versions/3.8/Resources/Python.app/Contents/MacOS/Python /Users/pizzi/.virtualenvs/aiida-dev/bin/verdi shell so AiiDA would now say (when launching a locking process in old) that the old profile is being used, even if it's not true (the verdi shell in for profile new). But I guess there is no easy way around it, and not checking the command at all would have the same (super rare) issue; while the current implementation solves the much more common case of PID reuse.

Also regarding the ctime, why not just use the ctime of the file?

This is a good idea. As I commented, we should really limit the content of the file, because otherwise we need to start thinking to all cases of invalid content and how to deal with them. @ramirezfranciscof what do you think? The content of the PID file could simply be the string of the command (no JSON involved).

@giovannipizzi
Copy link
Member

Also regarding the ctime, why not just use the ctime of the file?

Note: probably @ramirezfranciscof didn't use the file ctime because in his original implementation, this was part of the comparison to check if the process was the same therefore it was crucial that this was the ctime of the process, not of the creation of the file that can happen milliseconds or even seconds later if the computer is slow. So it had to be the time from psutil. If we agree that now ctime is only used for information to the user, I think the file ctime is enough.

ramirezfranciscof and others added 2 commits January 10, 2022 07:50
It introduces the class `ProfileAccessManager`, used to keep track and
control what system processes access the aiida profiles. It has the
following public methods:

 - `request_access`: to be called every time the profile is loaded
   (for example, inside of `load_backend_environment` in the module
   `aiida.backends.manager:BackendManager`). A file will be created
   with the process ID as its name so as to keep track of who is
   accessing (or has accessed) the profile.

 - `lock`: this context manager makes sure the process calling it is
   the only one accessing the profile. It does so by checking that
   there are not tracking files that correspond to currently running
   processes and by creating a locking file that will prevent other
   processes from accessing the profile.

 - `is_active` / `is_locked`: for clients to easily check if a profile
   is being used or locked by running processes (respectively). I would
   not recommend to use these as guards before calling `request_access`
   or `lock` since running conditions are still possible, it is better
   to use `try ... except ...` instead.

 - `clear_locks`: this removes any current lock by force-deleting the
   locking file. This will not stop any process that was locking the
   profile if it is still running: that process will still "think" it
   has exclusive access to that profile, which could result in data
   corruption. This method should be used with extreme care.

Co-authored-by: Sebastiaan Huber <[email protected]>
@codecov
Copy link

codecov bot commented Jan 10, 2022

Codecov Report

Merging #5270 (d51165d) into develop (8e52d18) will increase coverage by 0.04%.
The diff coverage is 98.92%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #5270      +/-   ##
===========================================
+ Coverage    82.07%   82.11%   +0.04%     
===========================================
  Files          533      534       +1     
  Lines        38307    38398      +91     
===========================================
+ Hits         31436    31526      +90     
- Misses        6871     6872       +1     
Flag Coverage Δ
django 77.16% <98.92%> (+0.03%) ⬆️
sqlalchemy 76.48% <98.92%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
aiida/common/__init__.py 100.00% <ø> (ø)
aiida/manage/profile_access.py 98.77% <98.77%> (ø)
aiida/common/exceptions.py 100.00% <100.00%> (ø)
aiida/manage/configuration/settings.py 96.16% <100.00%> (+0.41%) ⬆️
aiida/manage/manager.py 80.77% <100.00%> (+0.33%) ⬆️
aiida/engine/daemon/client.py 75.38% <0.00%> (-1.00%) ⬇️
aiida/transports/plugins/local.py 81.71% <0.00%> (+0.26%) ⬆️
aiida/cmdline/utils/decorators.py 77.42% <0.00%> (+1.62%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8e52d18...d51165d. Read the comment docs.

@ramirezfranciscof ramirezfranciscof force-pushed the lock_profile branch 2 times, most recently from 1118415 to 3c7f70f Compare January 10, 2022 11:50
@csadorf
Copy link
Contributor

csadorf commented Jan 10, 2022

@ramirezfranciscof Did you consider to delegate some of the file-locking related implementation to this or any other library? I have had good experiences with https://pypi.org/project/filelock/.

@chrisjsewell
Copy link
Member

I have had good experiences with pypi.org/project/filelock.

Ha yep, this is the package I pointed to on slack (in the reviewing channel)

@ramirezfranciscof
Copy link
Member Author

@csadorf I checked the https://pypi.org/project/filelock/ but I don't think it works for our purposes.

Long story short: it is critical here that we are tracking both access as well as locking. I have both an "access track" and a "lock" procedures, and the "lock" needs to check not only for the absence of other locks, but also has to make sure there is no ongoing access. Moreover the tracking of access has the added difficulty that it there is no common point of "closing" (we can't tell when a given process stopped using a profile).

Example: If I open a verdi shell it should not lock the profile, I should be able to open a 2nd one. But I do need to keep track that I opened the first verdi shell: if I try to then lock the profile with the second shell, it has to say "I can't lock because there is another shell open".

As far as I could see, there was no support for something like this in that library.

@csadorf
Copy link
Contributor

csadorf commented Jan 10, 2022

@ramirezfranciscof I might misunderstand this, but I think the example you provide could be realized with a combination of hard and soft locks where the soft lock checks for the existence for the lock file, but does not require exclusive access.

Either way, I am not saying that you should definitely use the library (I also have not reviewed the PR in detail (yet)), I just want to make sure that it was adequately considered.

@ramirezfranciscof ramirezfranciscof force-pushed the lock_profile branch 4 times, most recently from c48b9b6 to 3c73bec Compare January 11, 2022 09:55
@ramirezfranciscof
Copy link
Member Author

@ramirezfranciscof I might misunderstand this, but I think the example you provide could be realized with a combination of hard and soft locks where the soft lock checks for the existence for the lock file, but does not require exclusive access.

I can't see how mixing types of lock would help in this case. If the first process acquires any kind of lock, then the second one won't be able to do so until the first one releases. This is the same for both hard and soft locks as far as I could see from my tests of the library.

If you want, you can ping me on slack and I'm happy to arrange a zoom meeting so I can go over the parameters of the problem and we can brainstorm more concrete possible implementations. But from what I saw of this library and the tests I made, I can't think of any way to use filelock that works for this use case.

@sphuber
Copy link
Contributor

sphuber commented Jan 17, 2022

@csadorf and @chrisjsewell could you please explain how you think the implementation can be simplified using filelock while keeping the functionality? It would be good that you can either present the alternative solution or confirm that the current implementation is good, so we can continue with the PR. This should probably go into the upcoming release so it is blocking.

Copy link
Contributor

@csadorf csadorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sphuber I believe I made it abundantly clear that my comments are not to be regarded as blocking to this PR, I just wanted to make the authors and reviewers of this PR aware of an existing solution to this problem to ensure that "re-inventing the wheel is avoided" and existing libraries are adequately considered. I was never formally requested for review for this draft PR, but have reviewed it anyways now.

I think it would be helpful to adopt the term "non-exclusive" access where appropriate. I think @ramirezfranciscof 's explanation would have been easier for me to follow using this terminology. Unfortunately, py-filelock does not currently support non-exclusive/shared locks, which would be required to implement this logic with the library.

To move forward I would suggest to:

  1. Adopt the terminology "exclusive" and "non-exclusive" access.
  2. Ensure that the code introduced with this PR is covered by unit tests.
  3. Revisit the specific implementation of acquiring the lock file via the filelock library.

raise exc

@contextlib.contextmanager
def lock(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation of the locking logic here is the only concern I have with this PR. A platform independent, well-tested in practice, and overall well-maintained implementation of this logic can be found in: https://github.com/tox-dev/py-filelock/tree/main/src/filelock . The only thing the library does not do is unlinking the lock file after releasing it, but I would argue that we can either adapt to that behavior or simply wrap the context manager and perform that step ourselves.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is that at this point I think we would be introducing a new library not only at normal cost (i.e. dependency maintenance), but also with the extra of having to coordinate it awkwardly for our use case (which adds another point of possible break), and with little to nothing to gain:

  • Our lock method still has to check the access files which is basically 80% of its logic, so not a lot of simplification there. The internal call to their lock would only be replacing the file creation basically (which should not have a lot of platform dependency issues).
  • The typical benefit of "outsourcing" the effort of having to deal with maintaining the implementation is reduced given that we anyways would need to maintain the other half of tracking "non-exclusive" access.
  • With respect to the difference in behavior (our lock currently raises if you try to lock a profile that isalready locked, their locking mechanism has the process waiting for the other lock to release) I think I actually prefer ours. The situation where this is relevant is when a racing condition triggers and we have two processes trying to acquire the lock at almost the same time. I think I personally prefer our resolution to this situation.

I don't know, maybe I'm underestimating the possible complications of locking with files, but I don't feel this is a good tradeoff. @sphuber you are quite familiar with the code by now, do you feel like we could gain something from trying to fit in the py-filelock in this design?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see both points. Normally, for such an important thing, I would also be tempted to use the external package, but I think you expressed the counter-arguments pretty well. We could indeed use it for our own lock implementation, but for the request_access we would still have our own implementation.

There would still be a robustness advantage in replacing our own locking implementation with the filelock package in the lock method. If we simply use their FileLock, we don't have to worry about racing conditions and robustness. As simple as it may seem, it is often quite tricky to be sure there are no subtle issues. The problem is that we cannot use the process ID in the filename, because the lock filename should be identical for all processes. But we can also not write the process ID in the lockfile, which the docs of filelock clearly state not to do. If we want to keep the functionality to be able to tell which process already has a lock when another tries to acquire it, we would have to start writing that to a separate file. But that has problems in and of its own.

TLDR: I think it would be preferable to use the package, but I think this is not possible due to our requirements of having non-exclusive locks and wanting to notify a user which processes are accessing it or locking it if a lock is denied. Only if we are willing to give up this user functionality, can we simplify the implementation and use the package. So in the end, this is the decision we have to take I believe.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest move forward with the current implementation, ensure that is extremely well tested (that's one of those things that should have 100% coverage IMO), and then maybe we can spend 1-2hrs trying to refactor it with the libraries that I have mentioned. If it is too complicated, then we just stick with the current approach until we run into problems.

)
self._raise_if_active(error_message)

filepath = self._dirpath_records / f'{self.process.pid}.lock'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the exclusive lock file to be PID specific?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we record the PID we can inform the user which process is locking when raising errors about not being able to access a profile and we can also perform some checks ourselves (i.e. decide that we will automatically check if the process is running and unlock if it is not). Putting it in the name was the easiest way.

Moreover, having separate locking files gives an easy way to prevent racing conditions: if by the end of the process of acquiring the lock there is more than one lock file, something "went wrong" (more than one process were simultaneously trying to acquire the lock) and we remove at least the one that correspond to the process checking. This makes it impossible for two processes to each think they acquired a lock for exclusive access.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow this argument, isn't the whole point of using a single exclusive lock file to prevent race conditions? If it possible that two processes can obtain the same lock at the same time, then the file system does not support locking. If you need to have the PID, then just obtain the lock and then write the PID file into a distinct PID file.

To my understanding, the file locking procedure should be an atomic operation, whereas checking for files, creating a file, and then checking for file existence again is not.

There is also https://pypi.org/project/pid/ which might support that exact use case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moreover, having separate locking files gives an easy way to prevent racing conditions:

@csadorf is right here. The whole point of a lock file is to prevent race conditions, so the filelock library would naturally protect against this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit lost in this thread now. I don't know if maybe a "lock file" is a more technical term with very specific details of how it is implemented; I'm using files to track access to the AiiDA profile and provide a context in which I guarantee no other process was accessing it.

My design indeed can't get completely rid of the racing condition, but it exchanges the situation of two competing processes both thinking they acquired the lock for a situation in which they both think the other one beat them to it and except. I think this should be good enough for our use case since it effectively prevents any corruption of the data and the user can just try again (I can't imagine a case where there are multiple processes trying to lock the profile by design). Using filelock would solve this but I still think this does not outweigh the costs of including an extra dependency and extra complexity for coordinating it with our other .pid tracking files (and having to keep track separately of the id of the locking process).

There is also https://pypi.org/project/pid/ which might support that exact use case.

Mmm maybe? The "stale detection" line sounds interesting, but honestly I can't even understand what exactly does this do from the documentation they have (which seems to be reduced to that small readme).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear, I am also not trying to antagonize here. I am just trying to help by trying to see if the suggestion of @chrisjsewell and @csadorf could work. The best way to do this and to show that we are taking it seriously, is to actually try it and propose an alternate implementation. Making it concrete is the best way to visualize any problems and could help us finalize the decision with the design. Otherwise, if we stick with just discussions we never move forward. But I think we have done enough now and I think we should just stick with this design then. Unless the others really think that having the additional information of which PIDs are blocking is not worth the additional complexity of the design and forcing a custom implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have already suggested to move forward with the current implementation with extremely high test coverage and then see if we can recover the exact same behavior using the available libraries. I would personally be extremely surprised if that was not possible.

I am a bit disappointed that my suggestion to adopt more concise semantics was ignored. @ramirezfranciscof You are using the term lock file, but are not technically locking anything and also state that you are merely tracking "access". At the same time, these access files are used to simulate exclusive and non-exclusive locking behavior.

What I am seeing here is the need for

  1. Locking the profile exclusively for specific operations (e.g. migrations),
  2. locking the profile non-exclusively to to prevent exclusive locking (standard access),
  3. tracking which processes are obtaining any of these locks.

All of these use cases could certainly be implemented with aforementioned libraries. It is just open whether the implementation would be more or less complex. What appears obvious to me is that at least the exclusive locking would be more robust.

Please move forward with this PR any way you see fit. I believe that especially when it comes to sensitive operations such as file locking, well-tested and platform independent libraries should be considered which is why I suggested the specific library that I have had good experience with. I am starting to get somewhat frustrated needing to defend that suggestion over and over again. In the end it does not matter to me whether we use the library or not, what I care about is that things work and work well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of these use cases could certainly be implemented with aforementioned libraries.

This is what you keep saying, but then when we describe potential reasons why this is not possible, we don't get a specific response whether you agree with that analysis. I tried to fix this by making it more concrete, going out of my way to provide a potential implementation on a PR that is not mine, but still we get a repetition of platitudes that it should be possible.

I am starting to get somewhat frustrated needing to defend that suggestion over and over again.

Well that makes two of us. I am not asking you to defend it, I am asking you to look at our analysis of trying to implement your suggestion and the limitations that we think are there, and either provide a clear solution to those limitations or accept that they are there and so sign off on the design. Again, I am doing this in order to not discard your suggestion without proper consideration, out of respect to you and the time you took to review the concept of this PR, not to cause you grief. Just reiterating that "you think it should be possible with the library" is not helping.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of these use cases could certainly be implemented with aforementioned libraries.

This is what you keep saying, but then when we describe potential reasons why this is not possible, we don't get a specific response whether you agree with that analysis. I tried to fix this by making it more concrete, going out of my way to provide a potential implementation on a PR that is not mine, but still we get a repetition of platitudes that it should be possible.

I do not understand what you are expecting from me here. Do I have to implement it myself? I think I provided a clear analysis on what I interpret we are trying to implement here and made suggestion on how to improve the language and abstraction as well as provided guidance as to how I would implement it. When I inquired about specific obstacles the presented arguments were not convincing.

I am starting to get somewhat frustrated needing to defend that suggestion over and over again.

Well that makes two of us. I am not asking you to defend it, I am asking you to look at our analysis of trying to implement your suggestion and the limitations that we think are there, and either provide a clear solution to those limitations or accept that they are there and so sign off on the design. Again, I am doing this in order to not discard your suggestion without proper consideration, out of respect to you and the time you took to review the concept of this PR, not to cause you grief. Just reiterating that "you think it should be possible with the library" is not helping.

Again, not quite sure what is expected here despite providing an actual implementation. You have already received my approval to go ahead multiple times.

I apologize for not providing a constructive review and will withdraw from the discussion. I am very sorry that I wasted everybody's time with my suggestion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I apologize for not providing a constructive review and will withdraw from the discussion. I am very sorry that I wasted everybody's time with my suggestion.

I am sorry if anything I said gave the impression that this was my stance, it is not at all. I really appreciate you taking the time to give feedback on this, and I truly think it was very valuable even if in the end I might opt or push to not apply all of it. I assure I did not just dismissed it: I really tried looking into the filelock library and playing around with it, but couldn't figure out a non-forceful way to leverage it for our specific conditions. I wanted to respond to all your points not as an attack to them, nor expecting you to defend the suggestions, but because they were perfectly valid points and thus merited an explanation of why, for this particular use case, I might prefer to decide against them.

I also tried my best to start using the proposed terminology throughout the discussion, is just that it is a bit hard for me to immediately understand how exactly the terms are intended and what is the extent of their influence (e.g. is "exclusive access" a qualifier when talking about locking? does it completely replace "locking"? can I still talk about lock files or how do I refer to the files I use for tracking this now?). Again, this is not to say it was on you to clarify all this right away, I'm just trying to explain why I may not have seem to fully adopt these semantics right away.

Lastly I hope my comments on the pid library's documentation didn't contribute to this. It was meant to express my frustration at the library itself, not at your recommendation of it which was very pertinent. Apologies if it seemed that way.

I think @giovannipizzi may be right, this has already eroded all of us a bit and it might be better to just go ahead and merge it. If at some point someone wishes to revisit incorporating any of these libraries with some fresh eyes or improving the feature in any other way, I would be more than happy to help.

@sphuber
Copy link
Contributor

sphuber commented Jan 20, 2022

Running this locally against a test profile, I get two failing tests:

=================================================================================================== FAILURES ====================================================================================================
______________________________________________________________________________________________ test_access_control ______________________________________________________________________________________________

profile_access_manager = <aiida.backends.managers.profile_access.ProfileAccessManager object at 0x7f87c698e490>

    def test_access_control(profile_access_manager):
        """Tests the request_access method indirectly.
    
        This test is performed in an integral way because the underlying methods used
        have all been tested elsewhere, and it is more relevant to indirectly verify
        that this method works in real life scenarios, rather than checking the specifics
        of its internal logical structure.
        """
        accessing_process = TestProcess()
        accessing_pid = accessing_process.start()
        assert profile_access_manager.is_active()
        accessing_process.stop()
    
        process_file = str(accessing_pid) + '.pid'
        tracking_files = [filepath.name for filepath in profile_access_manager._get_tracking_files('.pid')]
>       assert process_file in tracking_files
E       AssertionError: assert '22375.pid' in ['22352.pid']

tests/backends/managers/test_profile_access.py:263: AssertionError
___________________________________________________________________________________________________ test_lock ___________________________________________________________________________________________________

profile_access_manager = <aiida.backends.managers.profile_access.ProfileAccessManager object at 0x7f87c698ee20>, monkeypatch = <_pytest.monkeypatch.MonkeyPatch object at 0x7f87c698eb80>

    def test_lock(profile_access_manager, monkeypatch):
        """Tests the locking mechanism.
    
        This test is performed in an integral way because the underlying methods used
        have all been tested elsewhere, and it is more relevant to indirectly verify
        that this method works in real life scenarios, rather than checking the specifics
        of its internal logical structure.
        """
        locking_proc = TestProcess()
        locking_pid = locking_proc.start()
        monkeypatch.setattr(profile_access_manager, 'process', psutil.Process(locking_pid))
    
        # It will not lock if there is a process accessing.
        access_proc = TestProcess()
        access_pid = access_proc.start()
        with pytest.raises(LockingProfileError) as exc:
            with profile_access_manager.lock():
                pass
        assert str(locking_pid) in str(exc.value)
>       assert str(access_pid) in str(exc.value)
E       assert '22445' in "process 22431 cannot lock profile `test_sqla` because it is being accessed.\nThe following processes are accessing th...da_dev/bin/python', '/home/sph/.virtualenvs/aiida_dev/bin/pytest', 'tests/backends/managers/test_profile_access.py']`)"
E        +  where '22445' = str(22445)
E        +  and   "process 22431 cannot lock profile `test_sqla` because it is being accessed.\nThe following processes are accessing th...da_dev/bin/python', '/home/sph/.virtualenvs/aiida_dev/bin/pytest', 'tests/backends/managers/test_profile_access.py']`)" = str(LockingProfileError("process 22431 cannot lock profile `test_sqla` because it is being accessed.\nThe following proces...a_dev/bin/python', '/home/sph/.virtualenvs/aiida_dev/bin/pytest', 'tests/backends/managers/test_profile_access.py']`)"))
E        +    where LockingProfileError("process 22431 cannot lock profile `test_sqla` because it is being accessed.\nThe following proces...a_dev/bin/python', '/home/sph/.virtualenvs/aiida_dev/bin/pytest', 'tests/backends/managers/test_profile_access.py']`)") = <ExceptionInfo LockingProfileError("process 22431 cannot lock profile `test_sqla` because it is being accessed.\nThe f.../python', '/home/sph/.virtualenvs/aiida_dev/bin/pytest', 'tests/backends/managers/test_profile_access.py']`)") tblen=4>.value

tests/backends/managers/test_profile_access.py:314: AssertionError
=============================================================================================== warnings summary ================================================================================================
tests/backends/managers/test_profile_access.py:179
  /home/sph/code/aiida/env/dev/aiida-core/tests/backends/managers/test_profile_access.py:179: PytestCollectionWarning: cannot collect test class 'TestProcess' because it has a __init__ constructor (from: tests/backends/managers/test_profile_access.py)
    class TestProcess():

-- Docs: https://docs.pytest.org/en/latest/warnings.html
============================================================================================ short test summary info ============================================================================================
FAILED tests/backends/managers/test_profile_access.py::test_access_control - AssertionError: assert '22375.pid' in ['22352.pid']
FAILED tests/backends/managers/test_profile_access.py::test_lock - assert '22445' in "process 22431 cannot lock profile `test_sqla` because it is being accessed.\nThe following processes are accessing th......
==================================================================================== 2 failed, 7 passed, 1 warning in 6.88s =====================================================================================

I made sure that before running the tests there are no residual pid or lock files in the access directory of the test profile. The errors seem reproducible (ran them several times). After each run of the tests, the access directory contains a pid file. Not sure by which test this is created and not cleaned.

@sphuber
Copy link
Contributor

sphuber commented Jan 20, 2022

Figured out the problem. Your tests are relying on the fact that the test profile is also marked as the default profile. Because when they launch a new external process, they do not explicitly specify the profile but rely on the default. This is fragile and the tests should explicitly specify the profile (being the same as the test profile) when launching an external process.

@csadorf csadorf dismissed their stale review January 20, 2022 17:14

withdrawn

@giovannipizzi
Copy link
Member

Hi all, just to avoid this goes into an infinite loop and everybody is disappointed:

  • thanks everybody for all the feedback! :-) It's nice to see you have been investing your time in improving this, and thanks to considering so carefully each other's suggestions.
  • I think at this point there is consensus in moving forward as it is: I maybe clarify on point. This locking is not something that happens very often. It will only happen, in practice, when a user calls a maintenance command "by hand" via verdi. I doubt that there will be many cases where two profiles will really try to lock at the same time, in practice; this is just an additional safety measure (at some point we even agreed that this could have happened in 2.1, if time was lacking - so we would have had a "very buggy" locking interface at that point, i.e. one which was not working at all!).
  • I agree we need to properly test it - if we are happy with the tests we can go ahead, and if really there is a bug in the future we'll fix it possibly switching to a library.

I'm suggesting this because I now think that in the balance of human energy invested in this, and the risk of a bug and its consequences, now it's time to merge this (otherwise we just spend more time, and people get more and more frustrated - I think we still have other things we also need to focus on before the release). And also because my analysis of the code (at least in my first review, I think now the code changed quite a lot) convinced me that the behaviour was correct in all edge cases I could think about.

Again, thanks again to everybody!

@ramirezfranciscof
Copy link
Member Author

Figured out the problem. Your tests are relying on the fact that the test profile is also marked as the default profile. Because when they launch a new external process, they do not explicitly specify the profile but rely on the default. This is fragile and the tests should explicitly specify the profile (being the same as the test profile) when launching an external process.

Hey, thanks for the testing and finding the problem! I could reproduce locally and I think the last commit fixed it, let me know.

Also, sidenote, after merging the develop branch again, I'm having problems when pip installing (I suspect after the merge of the PEP 621 PR):

(aiida) root@aiida:~/codes/aiida-core# pip install -e .[all]
ERROR: File "setup.py" not found. Directory cannot be installed in editable mode: /root/codes/aiida-core
(A "pyproject.toml" file was found, but editable mode currently requires a setup.py based build.)

The PR in question mentions that:

to use pip install -e ., this requires pip v21, but then it "just works"

Does this mean then I can no longer use editable mode until I update to pip v21 or is there another syntax to do it?

@chrisjsewell
Copy link
Member

chrisjsewell commented Jan 20, 2022

Does this mean then I can no longer use editable mode until I update to pip v21 or is there another syntax to do it?

I would certainly suggest updating pip (should be as simple as pip install --upgrade pip), but you can also look at https://flit.readthedocs.io/en/latest/cmdline.html#flit-install and the --symlink option. Additionally, there is no longer an all extra, but there is an option for this in flit install. I would finally note, different to pip, you need to specify --python

@giovannipizzi
Copy link
Member

One thing that just crossed my mind. Does anybody see a problem if AiiDA is used from two machines sharing the same filesystem? (I guess there might be many other problems we are not aware of in this case, e.g. with detecting if the daemon is working, but just pointing out the issue - if also detecting if the daemon is running has the same problem, then I would say we shouldn't worry in this PR). E.g. if one installs AiiDA on a login node of a supercomputer, that connects you to one of many possible login nodes. Then the PID written to file could be not existing on the machine you connect (because it's in a different machine). Again, this might be problematic also when checking if the daemon is running, so probably this is not a usecase we support?

Copy link
Contributor

@sphuber sphuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ramirezfranciscof . I have to say that I haven't had the chance to look into depth in the tests yet, but given the current situation, let's just get this thing merged. I just noticed some minor mistakes in the docstrings and a suggestion for the naming of the test class to avoid annoying warnings. Finally, I think the new module should simply go in aiida.manage.configuration since aiida.backends.managers might not be long for this world.

###########################################################################


class TestProcess():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest renaming this, since with the current name pytest think it is an actual class of tests and will emit a warning

Suggested change
class TestProcess():
class MockProcess():

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I didn't notice this warning. Thanks! Is changed now.

raise LockedProfileError(error_msg)

def _raise_if_active(self, message_start):
"""This method will raise the exception given in `ExceptionClass` if the profile is being accessed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""This method will raise the exception given in `ExceptionClass` if the profile is being accessed.
"""Raise a ``LockingProfileError`` if the profile is being accessed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

return list_of_files

def _raise_if_locked(self, message_start):
"""This method will raise the exception given in `ExceptionClass` if the profile is locked.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""This method will raise the exception given in `ExceptionClass` if the profile is locked.
"""Raise a ``LockedProfileError`` if the profile is locked.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@@ -0,0 +1,222 @@
# -*- coding: utf-8 -*-
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this file also not just belong in aiida.manage.configuration? It is there where config and profile loading/access is organized and where this is also being called. I think @chrisjsewell was getting rid of this entire module soon anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved it to aiida.manage.profile_access, it seemed more fitting that configuration, but let me know if you disagree.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me.

Random note that, as it does pertain to this line, but https://www.python.org/dev/peps/pep-3120/ makes the default encoding for python source UTF-8, so these # -*- coding: utf-8 -*- lines are kinda a legacy thing

@ramirezfranciscof
Copy link
Member Author

One thing that just crossed my mind. Does anybody see a problem if AiiDA is used from two machines sharing the same filesystem? (I guess there might be many other problems we are not aware of in this case, e.g. with detecting if the daemon is working, but just pointing out the issue - if also detecting if the daemon is running has the same problem, then I would say we shouldn't worry in this PR). E.g. if one installs AiiDA on a login node of a supercomputer, that connects you to one of many possible login nodes. Then the PID written to file could be not existing on the machine you connect (because it's in a different machine). Again, this might be problematic also when checking if the daemon is running, so probably this is not a usecase we support?

Yeah, indeed it does not currently support this use case, although I don't see how it would be possible to account for such a distributed system without migrating the profile loading to a context manager. I mean, it will probably be highly disruptive but necessary if we want to have good control over the access to the AiiDA instances. Maybe we can already start adding the features without making them required just to see how it feels to use them. Something like this:

with loaded_profile('profile_name') as aiida_profile:
    qb = aiida_profile.get_querybuilder()
    dict_node = orm.Dict({'example num': 1})
    dict_node.store(aiida_profile)
    same_node = aiida_profile.load_node(dict_node.pk)

@chrisjsewell
Copy link
Member

chrisjsewell commented Jan 21, 2022

Something like this:

@ramirezfranciscof this is literally what the whole of #5172 and #5145 (and all the other PRs I've been doing) is trying to achieve 😉 i.e. you can already now do:

In [1]: from aiida.tools.archive.abstract import get_format
In [2]: archive_format = get_format()
In [3]: with archive_format.open("2d-export-new.aiida", "r") as reader:
   ...:     qb = reader.querybuilder()
   ...:     node = reader.get(Dict, pk=10)
   ...:     print(qb.append(ProcessNode, tag="tag").append(Code, with_outgoing="tag").distinct().count())
   ...: 
10817

@ramirezfranciscof
Copy link
Member Author

@chrisjsewell yes, there seems to be a lot going on in those refactorings 😅 , but I guess essentially this is what we are talking about:

When the Manager is closed, this also calls close() on the backend, which shutdown the session and engine connection to the database.

Looking forward to that 👍🏽

@chrisjsewell
Copy link
Member

Looking forward to that 👍🏽

yep getting there cheers 😅

@sphuber sphuber dismissed stale reviews from giovannipizzi and chrisjsewell January 21, 2022 17:09

Requests addressed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants