-
Notifications
You must be signed in to change notification settings - Fork 305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve multi-threaded locking #2348
Conversation
I think this is ready now. I think I covered all the locking cases in the last commit but would be happy to add more. |
src/libostree/ostree-repo.c
Outdated
self->lock.exclusive--; | ||
else | ||
self->lock.shared--; | ||
g_ptr_array_remove_index (stack, stack->len - 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm wait, now that we have the two counters, do we actually need to store the stack? Since the valid state transitions can only be:
shared (++) -> exclusive (++)
i.e. we can never have any shared after exclusive, so why maintain the stack at all versus just the two counters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I briefly came to the same conclusion, but I think it's necessary to know what state to drop when there are multiple threads. In fact, what's here has a bug and I'm going to beef up the multithreaded test to check this.
Here's the scenario if I can get this ascii timeline to work out:
Thread 1 Thread 2 Flock state
======== ======== ===========
push exclusive exclusive
push shared exclusive
pop shared
With the current implementation, the flock would remain exclusive after the pop because the counters would say exclusive == 2
and shared == 0
before. In order to correctly return from exclusive to shared when thread 1 pops, you need to correctly track how many exclusive and shared locks are held instead of just incrementing exclusive
every time once the lock is exclusive. Then it would be exclusive == 1
and shared == 1
, but you still wouldn't know which state to drop unless it was stored with the thread that is doing the dropping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just pushed 3f78672 which tests this scenario and will now fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That failed as expected:
[2021-04-27T21:19:26.049Z] ERROR: tests/test-repo - Bail out! OSTree:ERROR:tests/test-repo.c:459:test_repo_lock_multi_thread: assertion failed (error == NULL): Locking repo shared failed: Resource temporarily unavailable (g-io-error-quark, 27)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And now with c7de18a tests/test-repo
passes. Unless you still think that's incorrect, I'm going to clean up the series.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in that case you could add some unlocked variant that's only available internally. That case actually violates the semantics since the transaction only has a shared lock and the expectation is that you have an exclusive lock when deleting.
Hmm. The tombstone object is very special though - clients just test whether or not it exists. Here a separate client with a shared lock querying the commit (that doesn't exist ) running simultaneously with another client writing that commit is racy, but in a safe way. We're not talking about losing data here, conceptually it's deleting a file as part of the act of creating data.
My mental model here is still thinking of shared = read, exclusive = write.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My mental model here is still thinking of shared = read, exclusive = write.
It took me a while to not think of them that way, but I've also touched this code by far the most. Maybe we should rename the lock names before making them public? I think I only named them that way initially since those were the flock names. The semantics it's really trying to encode are to not delete things that others might be using. I think you could name those ACCESS
and DELETE
. Probably there are better names, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's lots of terms of art in https://en.wikipedia.org/wiki/Isolation_(database_systems)
I think shared/exclusive is better than read/write but they're also in many cases mostly synonymous, and it's just easier to think of read/write in many cases. That said for example in Rust there's a sentiment that &/&mut
are misnamed and it should have been &/&excl
or so (short for "exclusive"), because &
in some cases still allows interior mutability).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we got into this discussion around the complexity of the state tracking, which was simplified by having the caller track which lock they acquired. That's better, but we were also having a discussion around the semantics of locking that I'm not sure is resolved. You said you wanted to change ostree_repo_delete_object()
to acquire an exclusive lock, which makes some sense.
I do still think the right fix is either:
- For removing tombstone objects to not require an exclusive lock
- Change the pull code to call an
_unlocked()
variant
Or honestly, it's going to be an ABI hazard to change the locking semantics of any functions at all, and that may apply in this case.
Actually _delete_object()
is also inherently very special and low level - you can easily just remove a file object referenced by a commit, etc. So we could instead add ostree_repo_delete_object_locked()
or so...but under what scenarios should that be used?
Were you actually planning to use this for something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The semantics and hazards is exactly why I started working on this again. In the case of _delete_object()
, there are several ways to make it work. In addition to the ways you mentioned above, I also have a local branch that adds a _delete_object_if_exists()
which simply does a _have_object()
before calling into the real _delete_object()
to avoid taking the lock unless you really need to. It's still racy, but for the case we're describing here it's not really different. Also just documenting that _delete_object()
is low level and you take an exclusive lock before using it would be fine, too.
Anyways, I wasn't trying to focus on this specific case. I don't have plans to use _delete_object()
directly. I was trying to use it as a concrete example of the general issue I'm worried about. The semantics of the repo locking without this PR are that threads operate on separate lock file descriptors. If you have multiple threads operating on the same OstreeRepo
, then sooner or later you're likely to have a deadlocked application when someone adds an innocuous lock call in ostree's internals.
I find this to not be a theoretical issue since ostree itself becomes multithreaded in pull, one of the most common operations someone would do on an ostree repo. So, I believe this hazard is not just a corner case of someone trying to run a multithreaded ostree program, but actually pretty likely to occur. All that has to happen is someone adds a locking call somewhere in an API that pull uses. My focus on _delete_object()
was that it's a realistic (or even desired) candidate.
With this PR the hazard reduces a lot and it actually gives the application (or even libostree itself) options. Threads will share the lock file descriptor and coordinate on locking. This is the same as if you wrote a multithreaded flock()
program and opened the file descriptor before spawning threads. If you want the threads to act independently (e.g. a pull in one thread and a prune in another) you just open separate OstreeRepo
s in each thread. I think this is a lot better and probably just needs some clearer documentation of the semantics.
Take 2. As described above, the semi-complex tracking of a stack of lock states is due to In reality it's probably not that much complexity for the caller. Since the locking is recursive, you can push and pop in very localized ways. In C,
I'm sure other languages would have similar constructs. Hmm, one thing I just noticed is that the giant comment describing the locking implementation is not accurate anymore. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. A few minor comments. I haven’t looked at the test additions in test-repo.c
.
src/libostree/ostree-repo.c
Outdated
@@ -660,7 +718,11 @@ ostree_repo_auto_lock_push (OstreeRepo *self, | |||
{ | |||
if (!ostree_repo_lock_push (self, lock_type, cancellable, error)) | |||
return NULL; | |||
return (OstreeRepoAutoLock *)self; | |||
|
|||
OstreeRepoAutoLock *auto_lock = g_new0 (OstreeRepoAutoLock, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the need to heap-allocate be avoided by making the OstreeRepoAutoLock
struct public (but opaque), and having callers stack-allocate it?
Or, if that’s too intrusive, you could use g_alloca()
here and require that OstreeRepoAutoLock
is always used with g_autoptr()
in a single scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did want to stack allocate but couldn't figure out a way that would be as nice to use as g_autoptr()
. Currently you can do:
g_autoptr(OstreeRepoAutoLock) lock = ostree_repo_auto_lock_push(repo, ...);
if (!lock)
return FALSE;
With the stack allocation, I think it would be:
g_auto(OstreeRepoAutoLock) lock = { 0, };
if (!ostree_repo_auto_lock_push(repo, &lock, ...))
return FALSE;
I guess that's not so bad. You could probably provide an initializer macro, too. I could go either way there.
Although I can't think of a reason why you'd want to use it outside of a single scope, requiring it seems a bit intrusive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that's not so bad. You could probably provide an initializer macro, too. I could go either way there.
Yeah, or you could change ostree_repo_auto_lock_push()
to return an OstreeRepoAutoLock
structure by value. Should be fine, since it’s only two pointers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's how I was thinking about it initially, but then how do you detect errors? Particularly if the structure is opaque. Unlike autoptr
where you can use NULL
, here you'd have to check if one of the members was in an error state since you'd actually have to return a structure. Even if it's not opaque it would be kind of ugly to use it that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could check the error
out argument. But true, it is quite idiosyncratic. 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented a stack allocated version in dbnicholson@bd0fb98. I'm pull that in here if that's the preferred API.
I pushed some fixups to address @pwithnall's review. I left the API |
I've already said a bunch here and probably need to get out of the way to let you chew on this, but just one last thought. I think if the API is changed so that pop requires the type (the 2nd commit), then we can decide later to change the semantics from "all threads lock independently" to "all threads sharing an OstreeRepo lock cooperatively" without breaking the API. Basically, I'm pushing for the latter because I feel like you'd expect the locking to cooperate wherever you share the OstreeRepo structure, but I can see cases where you want the threads to act independently even if they're accessing the same structure. |
@cgwalters any thoughts? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's commit to getting this in for the next release. We can do any further fixes as followups.
/ok-to-test |
Can you rebase on latest |
With or without squashing? |
I went without squashing for now. |
We aren't using a merge bot that understands |
Doing anything even somewhat sophisticated requires this; turns out our own `ostree prune` CLI wants this, e.g. ostreedev#2337 Closes: ostreedev#2286
This simplifies the lock state management considerably since the previously pushed type doesn't need to be tracked. Instead, 2 counters are kept to track how many times each lock type has been pushed. When the number of exclusive locks drops to 0, the lock transitions back to shared.
This will allow usage of `GMutexLocker`. This should be available on many older distros: * RHEL 7 - 2.56.1 * RHEL 8 - 2.56.4 * Debian 9 stretch (oldstable) - 2.50.3 * Debian 10 buster (stable) - 2.58.3 * Ubuntu 16.04 xenial - 2.48.2 * Ubuntu 18.04 bionic - 2.56.4
Previously each thread maintained its own lock file descriptor regardless of whether the thread was using the same `OstreeRepo` as another thread. This was very safe but it made certain multithreaded procedures difficult. For example, if a main thread took an exclusive lock and then spawned worker threads, it would deadlock if one of the worker threads tried to acquire the lock. This moves the file descriptor from thread local storage to the `OstreeRepo` structure so that threads using the same `OstreeRepo` can share the lock. A mutex guards against threads altering the lock state concurrently. Fixes: ostreedev#2344
Use `g_error` and `g_assert*` rather than `g_return*` when checking the locking preconditions so that failures result in the program terminating. Since this code is protecting filesystem data, we'd rather crash than delete or corrupt data unexpectedly. `g_error` is used when the error is due to the caller requesting an invalid transition like attempting to pop a lock type that hasn't been taken. It also provides a semi-useful message about what happened.
If there's a locking issue in this test, then it's likely not going to resolve after a few seconds of serializing access. Lower the default 30 second lock timeout to 5 seconds to prevent the test from hanging unnecessarily.
The semantics of multiple process locking are covered by test-concurrency.py, but the semantics of the repository locking from a single process aren't handled there. This checks how the repository locking is handled from a single thread with one OstreeRepo, a single thread with multiple OstreeRepos, and multiple threads sharing an OstreeRepo.
Done. Some of the fixups were conflicting, so I don't think a merge bot would have been able to resolve it anyways. I usually try to wait on squashing because some people want to only see the differences from what they've reviewed before. |
As described in #2344, the current repo locking is multithread-safe but doesn't actually allow for coordination between threads using the same
OstreeRepo
. This refactors the locking so it's maintained per-OstreeRepo
while retaining the thread safety and recursive features.I'm going to add some more tests, but I wanted to get some review on the actual changes. This is on top of #2341 because I wanted to run the tests added there (and likely build on them).
Fixes: #2344