feat: lock file #2011

sangaman · 2020-12-02T06:33:49Z

This introduces a lock file for xud that is created in the xud data directory any time xud starts up. Anytime a lock file already exists, an error message is logged and xud exits. This prevents multiple xud processes from trying to use the same node key or directory at the same time.

The lock files are unique to each network, so running multiple processes with different networks (e.g. simnet and mainnet) is allowed. They are deleted when xud shuts down.

Closes #1989.

ghost

utACK

raladev

does not work for xud-docker env

Steps:

bash xud.sh -b lock
initialize the wallet and unlock it
docker rm -f xud_simnet_1
bash xud.sh -b lock

Actual result:
.lock file still exists
Screenshot from 2020-12-02 16-13-44
Screenshot from 2020-12-02 16-13-49
Screenshot from 2020-12-02 16-13-46

sangaman · 2020-12-02T14:51:18Z

* [ ]  does not work for xud-docker env
Steps:
1. `bash xud.sh -b lock`

2. initialize the wallet and unlock it

3. docker rm -f xud_simnet_1

4. `bash xud.sh -b lock`
Actual result:
.lock file still exists
Screenshot from 2020-12-02 16-13-44
Screenshot from 2020-12-02 16-13-49
Screenshot from 2020-12-02 16-13-46

Hm this is likely due to the sudden/ungraceful way of stopping the container, not giving it a chance to delete the lock file. Does "downing" the container give the same behavior? I can try testing that myself. I wonder if there's anything else to do from the xud perspective to delete the lock file with ungraceful shutdowns, but if the process is just killed abruptly then I'm pretty sure nothing can be done. I could maybe put in a prompt on the docker side to delete the lock file if one is encountered.

michael1011 · 2020-12-02T15:11:03Z

I wonder if there's anything else to do from the xud perspective to delete the lock file with ungraceful shutdowns, but if the process is just killed abruptly then I'm pretty sure nothing can be done.

How does Bitcoin Core handle this? The also have lockfiles for the database files and somehow managed to deal with them gracefully even when you force kill the Bitcoin Core process. But researching that is low prio.

I could maybe put in a prompt on the docker side to delete the lock file if one is encountered.

In xud-docker we could simply check whether an xud container is running and if not, automatically remove the lockfile.

kilrau · 2020-12-02T15:44:05Z

This could also happen on other ungraceful shutdowns and I believe we should handle it, otherwise we'll have to explain to our users how to delete lockfiles :/ I never had lockfile issues with any other software, even though I killed processes/containers, so there must be a trick to it.

Probably a stupid question, but can a lockfile depend on a process? https://stackoverflow.com/questions/1599459/optimal-lock-file-method/ and https://perl.plover.com/yak/flock/ recommend using flock() and the process actually tries to obtain the lock on the file, not judging by pure existence of the lock file as far as I understand it.

sangaman · 2020-12-02T20:26:35Z

I'm used to deleting lock files with other applications that crash suddenly (due to power loss, etc). I believe I've had it happen with bitcoind too, and there are plenty of results for people getting cannot obtain a lock on data directory with bitcoin core. Conceptually I can't think of any you could delete a file inside a process that is suddenly killed.

Making it depend on a process could work potentially, but I don't know how we would play nice with running multiple processes on one machine which is a valid use case if they're using different networks or perhaps even different data directories.

sangaman · 2020-12-02T20:58:14Z

I can confirm that down or stop xud in xud-docker handle the lock file gracefully. So I think from docker's perspective it makes sense to prompt the user if it encounters a lock file. I'll look into that separately.

I'll fix this PR to update the xud custom patch.

This introduces a lock file for xud that is created in the xud data directory any time xud starts up. Anytime a lock file already exists, an error message is logged and xud exits. This prevents multiple xud processes from trying to use the same node key or directory at the same time. The lock files are unique to each network, so running multiple processes with different networks (e.g. simnet and mainnet) is allowed. They are deleted when xud shuts down. Closes #1989.

kilrau · 2020-12-03T10:50:28Z

Thanks again for throwing this up @sangaman , we definitely want and need an advisory lock like this. I still have the following concerns though:

I'm used to deleting lock files with other applications that crash suddenly (due to power loss, etc).

I am not tbh, and I killed my bitcoind/lnd multiple times already including sudden power loss. I am just not convinced we are handling things right if a simple container kill, leaves the .lock file on disk and then xud as well-behaved process refuses to start on this data dir because the .lock file exists. I am not challenging the .lock file approach in general, just how it's currently handled in this PR by deleting the file on graceful shutdown and judging .lock file exists = data dir locked.

I definitely would want to avoid explaining to users how to delete a .lock file vs. protection in a case where multiple xud's try to run on the same data dir.

So I did a little test with my bitcoind:

t0: bitcoind running (note the creation date of the .lock file - there were definitely multiple restarts & host reboots in between this date and today):

kilrau@beast:/media/SSD/bitcoind$ ls -la
total 47560
drwxr-xr-x 6 d    d        4096 Dec  3 10:23 .
drwxr-xr-x 5 d    d        4096 Nov 10 08:59 ..
-rw------- 1 root root        0 Jul 20 09:54 .lock

t1: stop bitcoind gracefully, .lock file still exists

kilrau@beast:/media/SSD/bitcoind$ ls -la
total 79612
drwxr-xr-x 6 d    d        4096 Dec  3 10:34 .
drwxr-xr-x 5 d    d        4096 Nov 10 08:59 ..
-rw------- 1 root root        0 Jul 20 09:54 .lock

t2: starting bitcoind again, .lock file still there, creation date unchanged:

kilrau@beast:/media/SSD/bitcoind$ ls -la
total 78600
drwxr-xr-x 6 d    d        4096 Dec  3 10:37 .
drwxr-xr-x 5 d    d        4096 Nov 10 08:59 ..
-rw------- 1 root root        0 Jul 20 09:54 .lock

So it looks like bitcoind does not delete the .lock file when shutting down, and when coming up judging the lock somehow differently than by pure existence of the file. Reading through bitcoin/bitcoin#19167 (comment) it looks like bitcoind is using fcntl which offers advanced file locks and locks are placed on the .lock file itself without ever changing/deleting it. Since the author proposed a switch to flock() and that's what I previously found too, can we look into using it?

michael1011 · 2020-12-03T12:07:47Z

One thing to keep in mind though when talking about low level file system calls @kilrau:

We are using Node. Which means we need to have native C++ bindings to be able to call those things which does not only add a dependency but also might not be compatible with every OS and file system. The first Node flock library I found when googling for such things was incompatible with BSD for example

kilrau · 2020-12-14T10:38:27Z

Maybe fcntl is the better choice then. But working on this PR is definitely P3 and we should only pick it up after MPP.

michael1011 · 2020-12-14T23:02:44Z

Maybe fcntl is the better choice then

May I remind you: we cannot use native file system calls without a native C++ (or Rust or whatever) dependency which might not be compatible with all file systems, operating systems and architectures

kilrau · 2020-12-14T23:12:30Z

Ok. Reading between the lines, you'd vouch for sticking with the current lockfile exists = locked approach?

michael1011 · 2020-12-14T23:17:12Z

I don't know what to suggest tbh. Both approaches have their up and downsides and we should consider carefully

kilrau · 2020-12-15T08:28:05Z

I don't know what to suggest tbh. Both approaches have their up and downsides and we should consider carefully

Agreed, it's not clear what the best way is right now and at least I have concerns of potentially making things worse with this PR (few will benefit from what the lockfile prevents, more might run into troubles not being able to start xud because lockfile exists).

I converted this PR to draft, we can pick it up again once a good solution evolves.

kilrau · 2020-12-21T12:47:56Z

Just dropping this here as an observation:

~$ sudo apt update
Reading package lists... Done
E: Could not get lock /var/lib/apt/lists/lock. It is held by process 2328 (packagekitd)
N: Be aware that removing the lock file is not a solution and may break your system.
E: Unable to lock directory /var/lib/apt/lists/

michael1011 · 2020-12-22T21:31:36Z

One thing that just came to my mind: what if we write the PID of the XUD process in the lockfile and check if the PID is still running if the lockfile exists?

kilrau · 2020-12-23T15:19:47Z

Hmm yeah that could work. Also should work on windows as per your link

sangaman added enhancement New feature or request code quality Improving code structure, organization, and clarity labels Dec 2, 2020

sangaman requested review from a user, michael1011 and raladev December 2, 2020 06:33

sangaman self-assigned this Dec 2, 2020

sangaman force-pushed the lock-file branch from aa69874 to 008f899 Compare December 2, 2020 07:46

ghost previously approved these changes Dec 2, 2020

View reviewed changes

raladev requested changes Dec 2, 2020

View reviewed changes

michael1011 previously approved these changes Dec 2, 2020

View reviewed changes

sangaman dismissed stale reviews from michael1011 and ghost via 671e049 December 2, 2020 21:03

sangaman force-pushed the lock-file branch from 008f899 to 671e049 Compare December 2, 2020 21:03

kilrau self-requested a review December 3, 2020 10:00

kilrau marked this pull request as draft December 15, 2020 08:15

sangaman closed this Jan 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: lock file #2011

feat: lock file #2011

sangaman commented Dec 2, 2020

ghost left a comment

raladev left a comment

sangaman commented Dec 2, 2020

michael1011 commented Dec 2, 2020

kilrau commented Dec 2, 2020

sangaman commented Dec 2, 2020

sangaman commented Dec 2, 2020

kilrau commented Dec 3, 2020

michael1011 commented Dec 3, 2020 •

edited

Loading

kilrau commented Dec 14, 2020 •

edited

Loading

michael1011 commented Dec 14, 2020

kilrau commented Dec 14, 2020

michael1011 commented Dec 14, 2020

kilrau commented Dec 15, 2020

kilrau commented Dec 21, 2020

michael1011 commented Dec 22, 2020

kilrau commented Dec 23, 2020

feat: lock file #2011

feat: lock file #2011

Conversation

sangaman commented Dec 2, 2020

ghost left a comment

Choose a reason for hiding this comment

raladev left a comment

Choose a reason for hiding this comment

sangaman commented Dec 2, 2020

michael1011 commented Dec 2, 2020

kilrau commented Dec 2, 2020

sangaman commented Dec 2, 2020

sangaman commented Dec 2, 2020

kilrau commented Dec 3, 2020

michael1011 commented Dec 3, 2020 • edited Loading

kilrau commented Dec 14, 2020 • edited Loading

michael1011 commented Dec 14, 2020

kilrau commented Dec 14, 2020

michael1011 commented Dec 14, 2020

kilrau commented Dec 15, 2020

kilrau commented Dec 21, 2020

michael1011 commented Dec 22, 2020

kilrau commented Dec 23, 2020

michael1011 commented Dec 3, 2020 •

edited

Loading

kilrau commented Dec 14, 2020 •

edited

Loading