Improve multi-threaded locking #2348

dbnicholson · 2021-04-16T22:18:44Z

As described in #2344, the current repo locking is multithread-safe but doesn't actually allow for coordination between threads using the same OstreeRepo. This refactors the locking so it's maintained per-OstreeRepo while retaining the thread safety and recursive features.

I'm going to add some more tests, but I wanted to get some review on the actual changes. This is on top of #2341 because I wanted to run the tests added there (and likely build on them).

Fixes: #2344

dbnicholson · 2021-04-22T19:22:31Z

I think this is ready now. I think I covered all the locking cases in the last commit but would be happy to add more.

src/libostree/ostree-repo.c

Makefile.am

src/libostree/ostree-repo.c

cgwalters · 2021-04-27T00:28:22Z

src/libostree/ostree-repo.c

+    self->lock.exclusive--;
+  else
+    self->lock.shared--;
+  g_ptr_array_remove_index (stack, stack->len - 1);


Hmm wait, now that we have the two counters, do we actually need to store the stack? Since the valid state transitions can only be:

shared (++) -> exclusive (++)

i.e. we can never have any shared after exclusive, so why maintain the stack at all versus just the two counters?

I briefly came to the same conclusion, but I think it's necessary to know what state to drop when there are multiple threads. In fact, what's here has a bug and I'm going to beef up the multithreaded test to check this.

Here's the scenario if I can get this ascii timeline to work out:

Thread 1 Thread 2 Flock state ======== ======== =========== push exclusive exclusive push shared exclusive pop shared

With the current implementation, the flock would remain exclusive after the pop because the counters would say exclusive == 2 and shared == 0 before. In order to correctly return from exclusive to shared when thread 1 pops, you need to correctly track how many exclusive and shared locks are held instead of just incrementing exclusive every time once the lock is exclusive. Then it would be exclusive == 1 and shared == 1, but you still wouldn't know which state to drop unless it was stored with the thread that is doing the dropping.

I just pushed 3f78672 which tests this scenario and will now fail.

That failed as expected:

[2021-04-27T21:19:26.049Z] ERROR: tests/test-repo - Bail out! OSTree:ERROR:tests/test-repo.c:459:test_repo_lock_multi_thread: assertion failed (error == NULL): Locking repo shared failed: Resource temporarily unavailable (g-io-error-quark, 27)

And now with c7de18a tests/test-repo passes. Unless you still think that's incorrect, I'm going to clean up the series.

Yes, in that case you could add some unlocked variant that's only available internally. That case actually violates the semantics since the transaction only has a shared lock and the expectation is that you have an exclusive lock when deleting.

Hmm. The tombstone object is very special though - clients just test whether or not it exists. Here a separate client with a shared lock querying the commit (that doesn't exist ) running simultaneously with another client writing that commit is racy, but in a safe way. We're not talking about losing data here, conceptually it's deleting a file as part of the act of creating data.

My mental model here is still thinking of shared = read, exclusive = write.

My mental model here is still thinking of shared = read, exclusive = write.

It took me a while to not think of them that way, but I've also touched this code by far the most. Maybe we should rename the lock names before making them public? I think I only named them that way initially since those were the flock names. The semantics it's really trying to encode are to not delete things that others might be using. I think you could name those ACCESS and DELETE. Probably there are better names, though.

There's lots of terms of art in https://en.wikipedia.org/wiki/Isolation_(database_systems)

I think shared/exclusive is better than read/write but they're also in many cases mostly synonymous, and it's just easier to think of read/write in many cases. That said for example in Rust there's a sentiment that &/&mut are misnamed and it should have been &/&excl or so (short for "exclusive"), because & in some cases still allows interior mutability).

So we got into this discussion around the complexity of the state tracking, which was simplified by having the caller track which lock they acquired. That's better, but we were also having a discussion around the semantics of locking that I'm not sure is resolved. You said you wanted to change ostree_repo_delete_object() to acquire an exclusive lock, which makes some sense.

I do still think the right fix is either:

For removing tombstone objects to not require an exclusive lock

Change the pull code to call an _unlocked() variant

Or honestly, it's going to be an ABI hazard to change the locking semantics of any functions at all, and that may apply in this case.

Actually _delete_object() is also inherently very special and low level - you can easily just remove a file object referenced by a commit, etc. So we could instead add ostree_repo_delete_object_locked() or so...but under what scenarios should that be used?

Were you actually planning to use this for something?

The semantics and hazards is exactly why I started working on this again. In the case of _delete_object(), there are several ways to make it work. In addition to the ways you mentioned above, I also have a local branch that adds a _delete_object_if_exists() which simply does a _have_object() before calling into the real _delete_object() to avoid taking the lock unless you really need to. It's still racy, but for the case we're describing here it's not really different. Also just documenting that _delete_object() is low level and you take an exclusive lock before using it would be fine, too.

Anyways, I wasn't trying to focus on this specific case. I don't have plans to use _delete_object() directly. I was trying to use it as a concrete example of the general issue I'm worried about. The semantics of the repo locking without this PR are that threads operate on separate lock file descriptors. If you have multiple threads operating on the same OstreeRepo, then sooner or later you're likely to have a deadlocked application when someone adds an innocuous lock call in ostree's internals.

I find this to not be a theoretical issue since ostree itself becomes multithreaded in pull, one of the most common operations someone would do on an ostree repo. So, I believe this hazard is not just a corner case of someone trying to run a multithreaded ostree program, but actually pretty likely to occur. All that has to happen is someone adds a locking call somewhere in an API that pull uses. My focus on _delete_object() was that it's a realistic (or even desired) candidate.

With this PR the hazard reduces a lot and it actually gives the application (or even libostree itself) options. Threads will share the lock file descriptor and coordinate on locking. This is the same as if you wrote a multithreaded flock() program and opened the file descriptor before spawning threads. If you want the threads to act independently (e.g. a pull in one thread and a prune in another) you just open separate OstreeRepos in each thread. I think this is a lot better and probably just needs some clearer documentation of the semantics.

dbnicholson · 2021-05-07T16:04:00Z

Take 2. As described above, the semi-complex tracking of a stack of lock states is due to ostree_repo_pop_lock not taking the lock type to pop. Instead, change the API to make it take the lock type to drop and move any of that complexity to the caller. The PR first adjusts the existing locking to do that and then moves it from thread local storage to the OstreeRepo structure with a mutex guarding it. After that I think it's really not very different from the existing locks like the transaction lock. The mutex makes it thread safe, but any clever multithreaded usage an OstreeRepo is coordinated by the caller. The only requirement is to push and pop symmetrically.

In reality it's probably not that much complexity for the caller. Since the locking is recursive, you can push and pop in very localized ways. In C, ostree_repo_auto_lock_push will already take care of this for you. In python I would write a simple context manager for it:

@contextlib.contextmanager
def lock_repo(repo, lock_type):
    repo.lock_push(lock_type)
    try:
        yield
    finally:
        repo.lock_pop(lock_type)

with lock_repo(repo, OSTree.RepoLockType.EXCLUSIVE):
    with lock_repo(repo, OSTree.RepoLockType.SHARED):
        do_something(repo)

I'm sure other languages would have similar constructs.

Hmm, one thing I just noticed is that the giant comment describing the locking implementation is not accurate anymore.

pwithnall

LGTM. A few minor comments. I haven’t looked at the test additions in test-repo.c.

src/libostree/ostree-repo.c

pwithnall · 2021-05-12T13:08:40Z

src/libostree/ostree-repo.c

@@ -660,7 +718,11 @@ ostree_repo_auto_lock_push (OstreeRepo          *self,
 {
  if (!ostree_repo_lock_push (self, lock_type, cancellable, error))
    return NULL;
-  return (OstreeRepoAutoLock *)self;
+
+  OstreeRepoAutoLock *auto_lock = g_new0 (OstreeRepoAutoLock, 1);


Could the need to heap-allocate be avoided by making the OstreeRepoAutoLock struct public (but opaque), and having callers stack-allocate it?

Or, if that’s too intrusive, you could use g_alloca() here and require that OstreeRepoAutoLock is always used with g_autoptr() in a single scope.

I did want to stack allocate but couldn't figure out a way that would be as nice to use as g_autoptr(). Currently you can do:

g_autoptr(OstreeRepoAutoLock) lock = ostree_repo_auto_lock_push(repo, ...); if (!lock) return FALSE;

With the stack allocation, I think it would be:

g_auto(OstreeRepoAutoLock) lock = { 0, }; if (!ostree_repo_auto_lock_push(repo, &lock, ...)) return FALSE;

I guess that's not so bad. You could probably provide an initializer macro, too. I could go either way there.

Although I can't think of a reason why you'd want to use it outside of a single scope, requiring it seems a bit intrusive.

I guess that's not so bad. You could probably provide an initializer macro, too. I could go either way there.

Yeah, or you could change ostree_repo_auto_lock_push() to return an OstreeRepoAutoLock structure by value. Should be fine, since it’s only two pointers.

That's how I was thinking about it initially, but then how do you detect errors? Particularly if the structure is opaque. Unlike autoptr where you can use NULL, here you'd have to check if one of the members was in an error state since you'd actually have to return a structure. Even if it's not opaque it would be kind of ugly to use it that way.

You could check the error out argument. But true, it is quite idiosyncratic. 🤷

I implemented a stack allocated version in dbnicholson@bd0fb98. I'm pull that in here if that's the preferred API.

src/libostree/ostree-repo.c

src/libostree/ostree-repo-private.h

dbnicholson · 2021-05-14T15:14:35Z

I pushed some fixups to address @pwithnall's review. I left the API ostree_repo_auto_lock_push the way it is allocating an OstreeRepoAutoLock structure (but switched to g_slice_*). I'm preparing a patch to switch to stack allocated if that's preferred.

dbnicholson · 2021-05-25T23:05:53Z

I've already said a bunch here and probably need to get out of the way to let you chew on this, but just one last thought.

I think if the API is changed so that pop requires the type (the 2nd commit), then we can decide later to change the semantics from "all threads lock independently" to "all threads sharing an OstreeRepo lock cooperatively" without breaking the API. Basically, I'm pushing for the latter because I feel like you'd expect the locking to cooperate wherever you share the OstreeRepo structure, but I can see cases where you want the threads to act independently even if they're accessing the same structure.

dbnicholson · 2021-06-04T16:05:07Z

@cgwalters any thoughts?

cgwalters

Let's commit to getting this in for the next release. We can do any further fixes as followups.

src/libostree/ostree-repo.c

cgwalters · 2021-06-04T21:32:05Z

/ok-to-test
/retest

cgwalters · 2021-06-04T21:32:33Z

Can you rebase on latest main to pull in the CI fixes?

dbnicholson · 2021-06-05T14:54:32Z

Can you rebase on latest main to pull in the CI fixes?

With or without squashing?

dbnicholson · 2021-06-05T14:56:00Z

Can you rebase on latest main to pull in the CI fixes?

With or without squashing?

I went without squashing for now.

cgwalters · 2021-06-05T14:56:23Z

We aren't using a merge bot that understands fixup! right now unfortunately, so please squash those.

Doing anything even somewhat sophisticated requires this; turns out our own `ostree prune` CLI wants this, e.g. ostreedev#2337 Closes: ostreedev#2286

This simplifies the lock state management considerably since the previously pushed type doesn't need to be tracked. Instead, 2 counters are kept to track how many times each lock type has been pushed. When the number of exclusive locks drops to 0, the lock transitions back to shared.

This will allow usage of `GMutexLocker`. This should be available on many older distros: * RHEL 7 - 2.56.1 * RHEL 8 - 2.56.4 * Debian 9 stretch (oldstable) - 2.50.3 * Debian 10 buster (stable) - 2.58.3 * Ubuntu 16.04 xenial - 2.48.2 * Ubuntu 18.04 bionic - 2.56.4

Previously each thread maintained its own lock file descriptor regardless of whether the thread was using the same `OstreeRepo` as another thread. This was very safe but it made certain multithreaded procedures difficult. For example, if a main thread took an exclusive lock and then spawned worker threads, it would deadlock if one of the worker threads tried to acquire the lock. This moves the file descriptor from thread local storage to the `OstreeRepo` structure so that threads using the same `OstreeRepo` can share the lock. A mutex guards against threads altering the lock state concurrently. Fixes: ostreedev#2344

Use `g_error` and `g_assert*` rather than `g_return*` when checking the locking preconditions so that failures result in the program terminating. Since this code is protecting filesystem data, we'd rather crash than delete or corrupt data unexpectedly. `g_error` is used when the error is due to the caller requesting an invalid transition like attempting to pop a lock type that hasn't been taken. It also provides a semi-useful message about what happened.

If there's a locking issue in this test, then it's likely not going to resolve after a few seconds of serializing access. Lower the default 30 second lock timeout to 5 seconds to prevent the test from hanging unnecessarily.

The semantics of multiple process locking are covered by test-concurrency.py, but the semantics of the repository locking from a single process aren't handled there. This checks how the repository locking is handled from a single thread with one OstreeRepo, a single thread with multiple OstreeRepos, and multiple threads sharing an OstreeRepo.

dbnicholson · 2021-06-05T15:32:19Z

We aren't using a merge bot that understands fixup! right now unfortunately, so please squash those.

Done. Some of the fixups were conflicting, so I don't think a merge bot would have been able to resolve it anyways. I usually try to wait on squashing because some people want to only see the differences from what they've reviewed before.

dbnicholson marked this pull request as draft April 16, 2021 22:19

openshift-ci-robot added the do-not-merge/work-in-progress label Apr 16, 2021

dbnicholson force-pushed the mt-locking branch from b9f0a49 to 3caac5c Compare April 22, 2021 19:20

dbnicholson marked this pull request as ready for review April 22, 2021 19:21

openshift-ci-robot removed the do-not-merge/work-in-progress label Apr 22, 2021

dbnicholson requested a review from cgwalters April 22, 2021 19:21

lucab reviewed Apr 26, 2021

View reviewed changes

cgwalters approved these changes Apr 27, 2021

View reviewed changes

dbnicholson force-pushed the mt-locking branch from c7de18a to 975595b Compare May 7, 2021 15:46

pwithnall reviewed May 12, 2021

View reviewed changes

dbnicholson requested a review from cgwalters May 14, 2021 15:14

cgwalters mentioned this pull request May 25, 2021

repo: Make locking APIs public #2341

Closed

cgwalters approved these changes Jun 4, 2021

View reviewed changes

src/libostree/ostree-repo.c Show resolved Hide resolved

openshift-ci bot added the ok-to-test label Jun 4, 2021

dbnicholson force-pushed the mt-locking branch from 29d85c8 to 0d1d93f Compare June 5, 2021 14:55

cgwalters and others added 5 commits June 5, 2021 09:00

repo: Make locking APIs public

0f36d8c

Doing anything even somewhat sophisticated requires this; turns out our own `ostree prune` CLI wants this, e.g. ostreedev#2337 Closes: ostreedev#2286

dbnicholson added 2 commits June 5, 2021 09:15

test-concurrency: Lower lock timeout

055b263

If there's a locking issue in this test, then it's likely not going to resolve after a few seconds of serializing access. Lower the default 30 second lock timeout to 5 seconds to prevent the test from hanging unnecessarily.

dbnicholson force-pushed the mt-locking branch from 0d1d93f to 06bb56b Compare June 5, 2021 15:29

cgwalters enabled auto-merge June 5, 2021 15:33

cgwalters merged commit 5523aee into ostreedev:main Jun 5, 2021

dbnicholson deleted the mt-locking branch June 6, 2021 03:53

This was referenced Jun 7, 2021

repo: Use g_new for OstreeRepoAutoLock #2370

Merged

Repo locking API backport endlessm/ostree#184

Merged

cgwalters mentioned this pull request Oct 26, 2021

compose: Always commit under a shared repo lock coreos/rpm-ostree#3193

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve multi-threaded locking #2348

Improve multi-threaded locking #2348

dbnicholson commented Apr 16, 2021

dbnicholson commented Apr 22, 2021

cgwalters Apr 27, 2021

dbnicholson Apr 27, 2021

dbnicholson Apr 27, 2021

dbnicholson Apr 27, 2021

dbnicholson Apr 27, 2021

cgwalters May 25, 2021

dbnicholson May 25, 2021

cgwalters May 26, 2021 •

edited

Loading

cgwalters Jun 4, 2021

dbnicholson Jun 5, 2021

dbnicholson commented May 7, 2021

pwithnall left a comment

pwithnall May 12, 2021

dbnicholson May 12, 2021

pwithnall May 13, 2021

dbnicholson May 13, 2021

pwithnall May 13, 2021

dbnicholson May 14, 2021

dbnicholson commented May 14, 2021

dbnicholson commented May 25, 2021

dbnicholson commented Jun 4, 2021

cgwalters left a comment

cgwalters commented Jun 4, 2021

cgwalters commented Jun 4, 2021

dbnicholson commented Jun 5, 2021

dbnicholson commented Jun 5, 2021

cgwalters commented Jun 5, 2021

dbnicholson commented Jun 5, 2021

Improve multi-threaded locking #2348

Improve multi-threaded locking #2348

Conversation

dbnicholson commented Apr 16, 2021

dbnicholson commented Apr 22, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cgwalters May 26, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dbnicholson commented May 7, 2021

pwithnall left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dbnicholson commented May 14, 2021

dbnicholson commented May 25, 2021

dbnicholson commented Jun 4, 2021

cgwalters left a comment

Choose a reason for hiding this comment

cgwalters commented Jun 4, 2021

cgwalters commented Jun 4, 2021

dbnicholson commented Jun 5, 2021

dbnicholson commented Jun 5, 2021

cgwalters commented Jun 5, 2021

dbnicholson commented Jun 5, 2021

cgwalters May 26, 2021 •

edited

Loading