Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: lock file #2011

Closed
wants to merge 1 commit into from
Closed

feat: lock file #2011

wants to merge 1 commit into from

Conversation

sangaman
Copy link
Collaborator

@sangaman sangaman commented Dec 2, 2020

This introduces a lock file for xud that is created in the xud data directory any time xud starts up. Anytime a lock file already exists, an error message is logged and xud exits. This prevents multiple xud processes from trying to use the same node key or directory at the same time.

The lock files are unique to each network, so running multiple processes with different networks (e.g. simnet and mainnet) is allowed. They are deleted when xud shuts down.

Closes #1989.

@sangaman sangaman added enhancement New feature or request code quality Improving code structure, organization, and clarity labels Dec 2, 2020
@sangaman sangaman requested review from a user, michael1011 and raladev December 2, 2020 06:33
@sangaman sangaman self-assigned this Dec 2, 2020
ghost
ghost previously approved these changes Dec 2, 2020
Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK

Copy link
Contributor

@raladev raladev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • does not work for xud-docker env

Steps:

  1. bash xud.sh -b lock
  2. initialize the wallet and unlock it
  3. docker rm -f xud_simnet_1
  4. bash xud.sh -b lock

Actual result:
.lock file still exists
Screenshot from 2020-12-02 16-13-44
Screenshot from 2020-12-02 16-13-49
Screenshot from 2020-12-02 16-13-46

michael1011
michael1011 previously approved these changes Dec 2, 2020
@sangaman
Copy link
Collaborator Author

sangaman commented Dec 2, 2020

* [ ]  does not work for xud-docker env

Steps:

1. `bash xud.sh -b lock`

2. initialize the wallet and unlock it

3. docker rm -f xud_simnet_1

4. `bash xud.sh -b lock`

Actual result:
.lock file still exists
Screenshot from 2020-12-02 16-13-44
Screenshot from 2020-12-02 16-13-49
Screenshot from 2020-12-02 16-13-46

Hm this is likely due to the sudden/ungraceful way of stopping the container, not giving it a chance to delete the lock file. Does "downing" the container give the same behavior? I can try testing that myself. I wonder if there's anything else to do from the xud perspective to delete the lock file with ungraceful shutdowns, but if the process is just killed abruptly then I'm pretty sure nothing can be done. I could maybe put in a prompt on the docker side to delete the lock file if one is encountered.

@michael1011
Copy link
Contributor

I wonder if there's anything else to do from the xud perspective to delete the lock file with ungraceful shutdowns, but if the process is just killed abruptly then I'm pretty sure nothing can be done.

How does Bitcoin Core handle this? The also have lockfiles for the database files and somehow managed to deal with them gracefully even when you force kill the Bitcoin Core process. But researching that is low prio.

I could maybe put in a prompt on the docker side to delete the lock file if one is encountered.

In xud-docker we could simply check whether an xud container is running and if not, automatically remove the lockfile.

@kilrau
Copy link
Contributor

kilrau commented Dec 2, 2020

This could also happen on other ungraceful shutdowns and I believe we should handle it, otherwise we'll have to explain to our users how to delete lockfiles :/ I never had lockfile issues with any other software, even though I killed processes/containers, so there must be a trick to it.

Probably a stupid question, but can a lockfile depend on a process? https://stackoverflow.com/questions/1599459/optimal-lock-file-method/ and https://perl.plover.com/yak/flock/ recommend using flock() and the process actually tries to obtain the lock on the file, not judging by pure existence of the lock file as far as I understand it.

@sangaman
Copy link
Collaborator Author

sangaman commented Dec 2, 2020

I'm used to deleting lock files with other applications that crash suddenly (due to power loss, etc). I believe I've had it happen with bitcoind too, and there are plenty of results for people getting cannot obtain a lock on data directory with bitcoin core. Conceptually I can't think of any you could delete a file inside a process that is suddenly killed.

Making it depend on a process could work potentially, but I don't know how we would play nice with running multiple processes on one machine which is a valid use case if they're using different networks or perhaps even different data directories.

@sangaman
Copy link
Collaborator Author

sangaman commented Dec 2, 2020

I can confirm that down or stop xud in xud-docker handle the lock file gracefully. So I think from docker's perspective it makes sense to prompt the user if it encounters a lock file. I'll look into that separately.

I'll fix this PR to update the xud custom patch.

This introduces a lock file for xud that is created in the xud data
directory any time xud starts up. Anytime a lock file already exists,
an error message is logged and xud exits. This prevents multiple
xud processes from trying to use the same node key or directory at the
same time.

The lock files are unique to each network, so running multiple processes
with different networks (e.g. simnet and mainnet) is allowed. They are
deleted when xud shuts down.

Closes #1989.
@sangaman sangaman dismissed stale reviews from michael1011 and ghost via 671e049 December 2, 2020 21:03
@kilrau kilrau self-requested a review December 3, 2020 10:00
@kilrau
Copy link
Contributor

kilrau commented Dec 3, 2020

Thanks again for throwing this up @sangaman , we definitely want and need an advisory lock like this. I still have the following concerns though:

I'm used to deleting lock files with other applications that crash suddenly (due to power loss, etc).

I am not tbh, and I killed my bitcoind/lnd multiple times already including sudden power loss. I am just not convinced we are handling things right if a simple container kill, leaves the .lock file on disk and then xud as well-behaved process refuses to start on this data dir because the .lock file exists. I am not challenging the .lock file approach in general, just how it's currently handled in this PR by deleting the file on graceful shutdown and judging .lock file exists = data dir locked.

I definitely would want to avoid explaining to users how to delete a .lock file vs. protection in a case where multiple xud's try to run on the same data dir.

So I did a little test with my bitcoind:

  • t0: bitcoind running (note the creation date of the .lock file - there were definitely multiple restarts & host reboots in between this date and today):
kilrau@beast:/media/SSD/bitcoind$ ls -la
total 47560
drwxr-xr-x 6 d    d        4096 Dec  3 10:23 .
drwxr-xr-x 5 d    d        4096 Nov 10 08:59 ..
-rw------- 1 root root        0 Jul 20 09:54 .lock
  • t1: stop bitcoind gracefully, .lock file still exists
kilrau@beast:/media/SSD/bitcoind$ ls -la
total 79612
drwxr-xr-x 6 d    d        4096 Dec  3 10:34 .
drwxr-xr-x 5 d    d        4096 Nov 10 08:59 ..
-rw------- 1 root root        0 Jul 20 09:54 .lock
  • t2: starting bitcoind again, .lock file still there, creation date unchanged:
kilrau@beast:/media/SSD/bitcoind$ ls -la
total 78600
drwxr-xr-x 6 d    d        4096 Dec  3 10:37 .
drwxr-xr-x 5 d    d        4096 Nov 10 08:59 ..
-rw------- 1 root root        0 Jul 20 09:54 .lock

So it looks like bitcoind does not delete the .lock file when shutting down, and when coming up judging the lock somehow differently than by pure existence of the file. Reading through bitcoin/bitcoin#19167 (comment) it looks like bitcoind is using fcntl which offers advanced file locks and locks are placed on the .lock file itself without ever changing/deleting it. Since the author proposed a switch to flock() and that's what I previously found too, can we look into using it?

@michael1011
Copy link
Contributor

michael1011 commented Dec 3, 2020

One thing to keep in mind though when talking about low level file system calls @kilrau:

We are using Node. Which means we need to have native C++ bindings to be able to call those things which does not only add a dependency but also might not be compatible with every OS and file system. The first Node flock library I found when googling for such things was incompatible with BSD for example

@kilrau
Copy link
Contributor

kilrau commented Dec 14, 2020

Maybe fcntl is the better choice then. But working on this PR is definitely P3 and we should only pick it up after MPP.

@michael1011
Copy link
Contributor

Maybe fcntl is the better choice then

May I remind you: we cannot use native file system calls without a native C++ (or Rust or whatever) dependency which might not be compatible with all file systems, operating systems and architectures

@kilrau
Copy link
Contributor

kilrau commented Dec 14, 2020

Ok. Reading between the lines, you'd vouch for sticking with the current lockfile exists = locked approach?

@michael1011
Copy link
Contributor

I don't know what to suggest tbh. Both approaches have their up and downsides and we should consider carefully

@kilrau kilrau marked this pull request as draft December 15, 2020 08:15
@kilrau
Copy link
Contributor

kilrau commented Dec 15, 2020

I don't know what to suggest tbh. Both approaches have their up and downsides and we should consider carefully

Agreed, it's not clear what the best way is right now and at least I have concerns of potentially making things worse with this PR (few will benefit from what the lockfile prevents, more might run into troubles not being able to start xud because lockfile exists).

I converted this PR to draft, we can pick it up again once a good solution evolves.

@kilrau
Copy link
Contributor

kilrau commented Dec 21, 2020

Just dropping this here as an observation:

~$ sudo apt update
Reading package lists... Done
E: Could not get lock /var/lib/apt/lists/lock. It is held by process 2328 (packagekitd)
N: Be aware that removing the lock file is not a solution and may break your system.
E: Unable to lock directory /var/lib/apt/lists/

@michael1011
Copy link
Contributor

One thing that just came to my mind: what if we write the PID of the XUD process in the lockfile and check if the PID is still running if the lockfile exists?

@kilrau
Copy link
Contributor

kilrau commented Dec 23, 2020

Hmm yeah that could work. Also should work on windows as per your link

@sangaman sangaman closed this Jan 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code quality Improving code structure, organization, and clarity enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use .lock file for data directory
4 participants