Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve multi-threaded locking #2348

Merged
merged 7 commits into from
Jun 5, 2021
Merged

Conversation

dbnicholson
Copy link
Member

As described in #2344, the current repo locking is multithread-safe but doesn't actually allow for coordination between threads using the same OstreeRepo. This refactors the locking so it's maintained per-OstreeRepo while retaining the thread safety and recursive features.

I'm going to add some more tests, but I wanted to get some review on the actual changes. This is on top of #2341 because I wanted to run the tests added there (and likely build on them).

Fixes: #2344

@dbnicholson
Copy link
Member Author

I think this is ready now. I think I covered all the locking cases in the last commit but would be happy to add more.

src/libostree/ostree-repo.c Outdated Show resolved Hide resolved
src/libostree/ostree-repo.c Outdated Show resolved Hide resolved
src/libostree/ostree-repo.c Outdated Show resolved Hide resolved
src/libostree/ostree-repo.c Outdated Show resolved Hide resolved
src/libostree/ostree-repo.c Outdated Show resolved Hide resolved
Makefile.am Show resolved Hide resolved
src/libostree/ostree-repo.c Outdated Show resolved Hide resolved
src/libostree/ostree-repo.c Outdated Show resolved Hide resolved
self->lock.exclusive--;
else
self->lock.shared--;
g_ptr_array_remove_index (stack, stack->len - 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm wait, now that we have the two counters, do we actually need to store the stack? Since the valid state transitions can only be:

shared (++) -> exclusive (++)

i.e. we can never have any shared after exclusive, so why maintain the stack at all versus just the two counters?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I briefly came to the same conclusion, but I think it's necessary to know what state to drop when there are multiple threads. In fact, what's here has a bug and I'm going to beef up the multithreaded test to check this.

Here's the scenario if I can get this ascii timeline to work out:

Thread 1	Thread 2	Flock state
========	========	===========
push exclusive			exclusive
		push shared	exclusive
pop				shared

With the current implementation, the flock would remain exclusive after the pop because the counters would say exclusive == 2 and shared == 0 before. In order to correctly return from exclusive to shared when thread 1 pops, you need to correctly track how many exclusive and shared locks are held instead of just incrementing exclusive every time once the lock is exclusive. Then it would be exclusive == 1 and shared == 1, but you still wouldn't know which state to drop unless it was stored with the thread that is doing the dropping.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just pushed 3f78672 which tests this scenario and will now fail.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That failed as expected:

[2021-04-27T21:19:26.049Z] ERROR: tests/test-repo - Bail out! OSTree:ERROR:tests/test-repo.c:459:test_repo_lock_multi_thread: assertion failed (error == NULL): Locking repo shared failed: Resource temporarily unavailable (g-io-error-quark, 27)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And now with c7de18a tests/test-repo passes. Unless you still think that's incorrect, I'm going to clean up the series.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in that case you could add some unlocked variant that's only available internally. That case actually violates the semantics since the transaction only has a shared lock and the expectation is that you have an exclusive lock when deleting.

Hmm. The tombstone object is very special though - clients just test whether or not it exists. Here a separate client with a shared lock querying the commit (that doesn't exist ) running simultaneously with another client writing that commit is racy, but in a safe way. We're not talking about losing data here, conceptually it's deleting a file as part of the act of creating data.

My mental model here is still thinking of shared = read, exclusive = write.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mental model here is still thinking of shared = read, exclusive = write.

It took me a while to not think of them that way, but I've also touched this code by far the most. Maybe we should rename the lock names before making them public? I think I only named them that way initially since those were the flock names. The semantics it's really trying to encode are to not delete things that others might be using. I think you could name those ACCESS and DELETE. Probably there are better names, though.

Copy link
Member

@cgwalters cgwalters May 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's lots of terms of art in https://en.wikipedia.org/wiki/Isolation_(database_systems)

I think shared/exclusive is better than read/write but they're also in many cases mostly synonymous, and it's just easier to think of read/write in many cases. That said for example in Rust there's a sentiment that &/&mut are misnamed and it should have been &/&excl or so (short for "exclusive"), because & in some cases still allows interior mutability).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we got into this discussion around the complexity of the state tracking, which was simplified by having the caller track which lock they acquired. That's better, but we were also having a discussion around the semantics of locking that I'm not sure is resolved. You said you wanted to change ostree_repo_delete_object() to acquire an exclusive lock, which makes some sense.

I do still think the right fix is either:

  • For removing tombstone objects to not require an exclusive lock
  • Change the pull code to call an _unlocked() variant

Or honestly, it's going to be an ABI hazard to change the locking semantics of any functions at all, and that may apply in this case.

Actually _delete_object() is also inherently very special and low level - you can easily just remove a file object referenced by a commit, etc. So we could instead add ostree_repo_delete_object_locked() or so...but under what scenarios should that be used?

Were you actually planning to use this for something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The semantics and hazards is exactly why I started working on this again. In the case of _delete_object(), there are several ways to make it work. In addition to the ways you mentioned above, I also have a local branch that adds a _delete_object_if_exists() which simply does a _have_object() before calling into the real _delete_object() to avoid taking the lock unless you really need to. It's still racy, but for the case we're describing here it's not really different. Also just documenting that _delete_object() is low level and you take an exclusive lock before using it would be fine, too.

Anyways, I wasn't trying to focus on this specific case. I don't have plans to use _delete_object() directly. I was trying to use it as a concrete example of the general issue I'm worried about. The semantics of the repo locking without this PR are that threads operate on separate lock file descriptors. If you have multiple threads operating on the same OstreeRepo, then sooner or later you're likely to have a deadlocked application when someone adds an innocuous lock call in ostree's internals.

I find this to not be a theoretical issue since ostree itself becomes multithreaded in pull, one of the most common operations someone would do on an ostree repo. So, I believe this hazard is not just a corner case of someone trying to run a multithreaded ostree program, but actually pretty likely to occur. All that has to happen is someone adds a locking call somewhere in an API that pull uses. My focus on _delete_object() was that it's a realistic (or even desired) candidate.

With this PR the hazard reduces a lot and it actually gives the application (or even libostree itself) options. Threads will share the lock file descriptor and coordinate on locking. This is the same as if you wrote a multithreaded flock() program and opened the file descriptor before spawning threads. If you want the threads to act independently (e.g. a pull in one thread and a prune in another) you just open separate OstreeRepos in each thread. I think this is a lot better and probably just needs some clearer documentation of the semantics.

@dbnicholson
Copy link
Member Author

Take 2. As described above, the semi-complex tracking of a stack of lock states is due to ostree_repo_pop_lock not taking the lock type to pop. Instead, change the API to make it take the lock type to drop and move any of that complexity to the caller. The PR first adjusts the existing locking to do that and then moves it from thread local storage to the OstreeRepo structure with a mutex guarding it. After that I think it's really not very different from the existing locks like the transaction lock. The mutex makes it thread safe, but any clever multithreaded usage an OstreeRepo is coordinated by the caller. The only requirement is to push and pop symmetrically.

In reality it's probably not that much complexity for the caller. Since the locking is recursive, you can push and pop in very localized ways. In C, ostree_repo_auto_lock_push will already take care of this for you. In python I would write a simple context manager for it:

@contextlib.contextmanager
def lock_repo(repo, lock_type):
    repo.lock_push(lock_type)
    try:
        yield
    finally:
        repo.lock_pop(lock_type)

with lock_repo(repo, OSTree.RepoLockType.EXCLUSIVE):
    with lock_repo(repo, OSTree.RepoLockType.SHARED):
        do_something(repo)

I'm sure other languages would have similar constructs.

Hmm, one thing I just noticed is that the giant comment describing the locking implementation is not accurate anymore.

Copy link
Member

@pwithnall pwithnall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. A few minor comments. I haven’t looked at the test additions in test-repo.c.

src/libostree/ostree-repo.c Outdated Show resolved Hide resolved
src/libostree/ostree-repo.c Outdated Show resolved Hide resolved
src/libostree/ostree-repo.c Outdated Show resolved Hide resolved
src/libostree/ostree-repo.c Outdated Show resolved Hide resolved
src/libostree/ostree-repo.c Show resolved Hide resolved
@@ -660,7 +718,11 @@ ostree_repo_auto_lock_push (OstreeRepo *self,
{
if (!ostree_repo_lock_push (self, lock_type, cancellable, error))
return NULL;
return (OstreeRepoAutoLock *)self;

OstreeRepoAutoLock *auto_lock = g_new0 (OstreeRepoAutoLock, 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the need to heap-allocate be avoided by making the OstreeRepoAutoLock struct public (but opaque), and having callers stack-allocate it?

Or, if that’s too intrusive, you could use g_alloca() here and require that OstreeRepoAutoLock is always used with g_autoptr() in a single scope.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did want to stack allocate but couldn't figure out a way that would be as nice to use as g_autoptr(). Currently you can do:

g_autoptr(OstreeRepoAutoLock) lock = ostree_repo_auto_lock_push(repo, ...);
if (!lock)
  return FALSE;

With the stack allocation, I think it would be:

g_auto(OstreeRepoAutoLock) lock = { 0, };
if (!ostree_repo_auto_lock_push(repo, &lock, ...))
  return FALSE;

I guess that's not so bad. You could probably provide an initializer macro, too. I could go either way there.

Although I can't think of a reason why you'd want to use it outside of a single scope, requiring it seems a bit intrusive.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that's not so bad. You could probably provide an initializer macro, too. I could go either way there.

Yeah, or you could change ostree_repo_auto_lock_push() to return an OstreeRepoAutoLock structure by value. Should be fine, since it’s only two pointers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's how I was thinking about it initially, but then how do you detect errors? Particularly if the structure is opaque. Unlike autoptr where you can use NULL, here you'd have to check if one of the members was in an error state since you'd actually have to return a structure. Even if it's not opaque it would be kind of ugly to use it that way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could check the error out argument. But true, it is quite idiosyncratic. 🤷

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented a stack allocated version in dbnicholson@bd0fb98. I'm pull that in here if that's the preferred API.

src/libostree/ostree-repo.c Outdated Show resolved Hide resolved
src/libostree/ostree-repo-private.h Outdated Show resolved Hide resolved
@dbnicholson
Copy link
Member Author

I pushed some fixups to address @pwithnall's review. I left the API ostree_repo_auto_lock_push the way it is allocating an OstreeRepoAutoLock structure (but switched to g_slice_*). I'm preparing a patch to switch to stack allocated if that's preferred.

@dbnicholson
Copy link
Member Author

I've already said a bunch here and probably need to get out of the way to let you chew on this, but just one last thought.

I think if the API is changed so that pop requires the type (the 2nd commit), then we can decide later to change the semantics from "all threads lock independently" to "all threads sharing an OstreeRepo lock cooperatively" without breaking the API. Basically, I'm pushing for the latter because I feel like you'd expect the locking to cooperate wherever you share the OstreeRepo structure, but I can see cases where you want the threads to act independently even if they're accessing the same structure.

@dbnicholson
Copy link
Member Author

@cgwalters any thoughts?

Copy link
Member

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's commit to getting this in for the next release. We can do any further fixes as followups.

src/libostree/ostree-repo.c Show resolved Hide resolved
@cgwalters
Copy link
Member

/ok-to-test
/retest

@cgwalters
Copy link
Member

Can you rebase on latest main to pull in the CI fixes?

@dbnicholson
Copy link
Member Author

Can you rebase on latest main to pull in the CI fixes?

With or without squashing?

@dbnicholson
Copy link
Member Author

Can you rebase on latest main to pull in the CI fixes?

With or without squashing?

I went without squashing for now.

@cgwalters
Copy link
Member

We aren't using a merge bot that understands fixup! right now unfortunately, so please squash those.

cgwalters and others added 5 commits June 5, 2021 09:00
Doing anything even somewhat sophisticated requires this;
turns out our own `ostree prune` CLI wants this, e.g.
ostreedev#2337

Closes: ostreedev#2286
This simplifies the lock state management considerably since the
previously pushed type doesn't need to be tracked. Instead, 2 counters
are kept to track how many times each lock type has been pushed. When
the number of exclusive locks drops to 0, the lock transitions back to
shared.
This will allow usage of `GMutexLocker`. This should be available on
many older distros:

* RHEL 7 - 2.56.1
* RHEL 8 - 2.56.4
* Debian 9 stretch (oldstable) - 2.50.3
* Debian 10 buster (stable) - 2.58.3
* Ubuntu 16.04 xenial - 2.48.2
* Ubuntu 18.04 bionic - 2.56.4
Previously each thread maintained its own lock file descriptor
regardless of whether the thread was using the same `OstreeRepo` as
another thread. This was very safe but it made certain multithreaded
procedures difficult. For example, if a main thread took an exclusive
lock and then spawned worker threads, it would deadlock if one of the
worker threads tried to acquire the lock.

This moves the file descriptor from thread local storage to the
`OstreeRepo` structure so that threads using the same `OstreeRepo` can
share the lock. A mutex guards against threads altering the lock state
concurrently.

Fixes: ostreedev#2344
Use `g_error` and `g_assert*` rather than `g_return*` when checking the
locking preconditions so that failures result in the program
terminating. Since this code is protecting filesystem data, we'd rather
crash than delete or corrupt data unexpectedly.

`g_error` is used when the error is due to the caller requesting an
invalid transition like attempting to pop a lock type that hasn't been
taken. It also provides a semi-useful message about what happened.
If there's a locking issue in this test, then it's likely not going to
resolve after a few seconds of serializing access. Lower the default 30
second lock timeout to 5 seconds to prevent the test from hanging
unnecessarily.
The semantics of multiple process locking are covered by
test-concurrency.py, but the semantics of the repository locking from a
single process aren't handled there.

This checks how the repository locking is handled from a single thread
with one OstreeRepo, a single thread with multiple OstreeRepos, and
multiple threads sharing an OstreeRepo.
@dbnicholson
Copy link
Member Author

We aren't using a merge bot that understands fixup! right now unfortunately, so please squash those.

Done. Some of the fixups were conflicting, so I don't think a merge bot would have been able to resolve it anyways. I usually try to wait on squashing because some people want to only see the differences from what they've reviewed before.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve multi-threaded repo locking
5 participants