Improve multi-threaded repo locking #2344

dbnicholson · 2021-04-15T15:40:32Z

Currently the repo locking is entirely thread local, which is very safe but makes it difficult to use in a multi-threaded application. The canonical issue would be if you took an exclusive lock and then spawned some threads with tried to take a shared lock. This would deadlock because the threads could never acquire the lock even though the application holds an exclusive lock on the repo.

This isn't theoretical. Back when I was working on #1292 I wanted to make ostree_repo_prune_object take an exclusive lock. However, it's actually used deep within the pull code to try to cleanup a tombstone commit (IIRC). The main thread acquires a shared repo lock when it starts a transaction and then the pull code spawns a bunch of threads. When a pull thread hits this path it deadlocks because it actually has no connection to the main thread lock.

If we're to make the repo locking public, it's pretty likely that someone would take an exclusive lock and then run the pull code. I know I've done that many times in our tooling when doing some maintenance task.

The text was updated successfully, but these errors were encountered:

dbnicholson · 2021-04-15T15:48:58Z

I'm sure there's probably a data structure that's more appropriate for this, but this is the idea I came up with.

The actual lock file descriptor is stored in OstreeRepo with a mutex guarding it. When threads want to change the lock state, they acquire the mutex then change the flock. The reason I didn't do this initially was because of the recursive nature of the locking. For a single thread, it's easy to have a single linear stack that represents the locking recursion.

For multiple threads that's not possible since they can act in any order. The way I thought of to handle this is to maintain a stack per thread in thread local storage like now so that when an individual thread pops a lock state it knows what type of lock state it's dropping. In the shared mutex guarded code, there would just be 2 counters for how many shared and exclusive states are on the stack. This maintains the latching aspect of it. I.e., the lock stays exclusive until all exclusive states have been popped. When all shared and exclusive states have been popped, the lock is unlocked.

cgwalters · 2021-04-16T13:00:58Z

OK from your original commit

The lock also attempts to be thread safe by storing the lock state in
thread local storage with GPrivate. This means that each thread will
have an independent lock for each repository it opens. There are some
drawbacks to that, but it seemed impossible to manage the lock state
coherently in the face of multithreaded access.

But...the patch Alexander had here originally https://bugzilla.gnome.org/show_bug.cgi?id=759442#c2 was way simpler and also by benefit of being file based (like the sysroot lock) also works across processes, which is I think what we actually want.

Then you proposed this: #813 (comment)

And we merged it but...it seems to be we really want a simple read/write lock, i.e. what you get in-process with a GRWLock and out of process with flock() (as Alex's original patch did). Everything except prune grabs a read lock (even though they're doing writes, due to how ostree works it's nondestructive, we aren't deleting anything that could be read later).

In other words I think the flaw originally here was thinking of the "read" lock as truly restricting to read operations whereas I think we should interpret to mean "can write new data, just not delete existing data".

dbnicholson · 2021-04-16T15:21:00Z

But...the patch Alexander had here originally https://bugzilla.gnome.org/show_bug.cgi?id=759442#c2 was way simpler and also by benefit of being file based (like the sysroot lock) also works across processes, which is I think what we actually want.

Then you proposed this: #813 (comment)

And we merged it but...it seems to be we really want a simple read/write lock, i.e. what you get in-process with a GRWLock and out of process with flock() (as Alex's original patch did). Everything except prune grabs a read lock (even though they're doing writes, due to how ostree works it's nondestructive, we aren't deleting anything that could be read later).

The lock I implemented is flock at the core. All the complexity is from the in management of it within a process or thread. In particular, I wanted something that was going to be easy to use both inside and outside ostree. I wanted to be able to liberally take the lock within the ostree code and not have to reason about whether something else had already taken the lock. This is why it's recursive. You can take the lock deep within the ostree code even if it was taken somewhere else in the ostree code or even outside the ostree code by the application. This is basically the source of the complexity. I initially was thinking about something like what Alex wrote until I realized that it would be hard to use correctly, even within ostree itself.

Likewise, since we want to use 2 states, there needs to be some tracking of what state the lock is in. This is particularly important when unwinding the recursion and figuring out what state to go back to. This is the other source of the complexity.

Now let's consider some of the locking types out there.

flock - Provides inter-process locking with 2 states. You can even make this provide inter-thread locking by opening a separate file descriptor per thread (this is what currently happens in ostree). However, it's not recursive, so on its own you end up having to use it in a global way like the sysroot lock or very carefully in a local way by reasoning about the current state of the lock.
GMutex - Provides inter-thread locking with a single state. This is missing several of the features I wanted - no inter-process locking, doesn't have 2 states, not recursive.
GRWLock - Like GMutex but has 2 states. No inter-process locking or recursion, though.
GRecMutex - Like GMutex but recursive. No inter-process locking and only single state, though.

So, what I want is the combination of flock and GRecMutex. That's how I ended up with what's there. However, while it is MT-safe by having a per-thread flock file descriptor, I actually want to have more coordination between the threads. I'd like the lock to be per-OstreeRepo but maintain the inter-process locking, 2 states and recursion.

I hadn't given the GMutex family of locks much thought at the time since I knew a mutex wouldn't be sufficient. However, now I think I'll read through them for inspiration. I'm actually really curious to see how GRecMutex is implemented.

In other words I think the flaw originally here was thinking of the "read" lock as truly restricting to read operations whereas I think we should interpret to mean "can write new data, just not delete existing data".

The code always refers to it by the flock names - shared and exclusive. As did Alex's patch. But I agree there has been some confusion about what those mean. However, I don't think it matters what the states are called, just that you have more than 1 state and you know what they refer to. In ostree we could refer to this as a creator/destroyer lock or something.

Previously each thread maintained its own lock file descriptor regardless of whether the thread was using the same `OstreeRepo` as another thread. This was very safe but it made certain multithreaded procedures difficult. For example, if a main thread took an exclusive lock and then spawned worker threads, it would deadlock if one of the worker threads tried to acquire the lock. This moves the file descriptor from thread local storage to the `OstreeRepo` structure so that threads using the same `OstreeRepo` can share the lock. Some bookkeeping is needed to keep track of the lock state when recursing and a mutex guards against threads altering the state concurrently. The required GLib is bumped to 2.44 to allow usage of `GMutexLocker`. Fixes: ostreedev#2344

Previously each thread maintained its own lock file descriptor regardless of whether the thread was using the same `OstreeRepo` as another thread. This was very safe but it made certain multithreaded procedures difficult. For example, if a main thread took an exclusive lock and then spawned worker threads, it would deadlock if one of the worker threads tried to acquire the lock. This moves the file descriptor from thread local storage to the `OstreeRepo` structure so that threads using the same `OstreeRepo` can share the lock. A mutex guards against threads altering the lock state concurrently. Fixes: ostreedev#2344

Previously each thread maintained its own lock file descriptor regardless of whether the thread was using the same `OstreeRepo` as another thread. This was very safe but it made certain multithreaded procedures difficult. For example, if a main thread took an exclusive lock and then spawned worker threads, it would deadlock if one of the worker threads tried to acquire the lock. This moves the file descriptor from thread local storage to the `OstreeRepo` structure so that threads using the same `OstreeRepo` can share the lock. Some bookkeeping is needed to keep track of the lock state when recursing and a mutex guards against threads altering the state concurrently. Fixes: ostreedev#2344

Previously each thread maintained its own lock file descriptor regardless of whether the thread was using the same `OstreeRepo` as another thread. This was very safe but it made certain multithreaded procedures difficult. For example, if a main thread took an exclusive lock and then spawned worker threads, it would deadlock if one of the worker threads tried to acquire the lock. This moves the file descriptor from thread local storage to the `OstreeRepo` structure so that threads using the same `OstreeRepo` can share the lock. A mutex guards against threads altering the lock state concurrently. Fixes: ostreedev#2344

cgwalters mentioned this issue Apr 15, 2021

repo: Make locking APIs public #2341

Closed

dbnicholson mentioned this issue Apr 16, 2021

Improve multi-threaded locking #2348

Merged

cgwalters closed this as completed in #2348 Jun 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve multi-threaded repo locking #2344

Improve multi-threaded repo locking #2344

dbnicholson commented Apr 15, 2021

dbnicholson commented Apr 15, 2021 •

edited

Loading

cgwalters commented Apr 16, 2021

dbnicholson commented Apr 16, 2021

Improve multi-threaded repo locking #2344

Improve multi-threaded repo locking #2344

Comments

dbnicholson commented Apr 15, 2021

dbnicholson commented Apr 15, 2021 • edited Loading

cgwalters commented Apr 16, 2021

dbnicholson commented Apr 16, 2021

dbnicholson commented Apr 15, 2021 •

edited

Loading