Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split RW and RO Layer locks per method #473

Closed
wants to merge 1 commit into from
Closed

Split RW and RO Layer locks per method #473

wants to merge 1 commit into from

Conversation

saschagrunert
Copy link
Member

@saschagrunert saschagrunert commented Nov 28, 2019

The new storage options ReadOnlyLayers indicates that the
layer store should be locked for read only access. This allows use cases
like listing container images while pushing an image to a remote
location in parallel.

Closes #420

@saschagrunert saschagrunert changed the title WIP: Add ReadOnly storage option Add ReadOnly storage option Nov 29, 2019
@saschagrunert saschagrunert changed the title Add ReadOnly storage option Add ReadOnlyLayers storage option Nov 29, 2019
@saschagrunert
Copy link
Member Author

This is now ready for a first review, I tried to implement with as low overhead as possible.

@rhatdan
Copy link
Member

rhatdan commented Nov 29, 2019

Tests are failing?

@saschagrunert
Copy link
Member Author

Tests are failing?

Yes, thanks for the hint.

My main concern right now is that we give the responsibility over "locking the right way" to the consumer of the API. But since the option is off per default and not serialized into the config, it should be fine...

@rhatdan
Copy link
Member

rhatdan commented Nov 29, 2019

LGTM
Need @nalind @vrothberg to review. Since they have played with locks a lot more then I have.
@mheon PTAL

@vrothberg
Copy link
Member

Aren't we having a meeting for this topic on Friday?

@saschagrunert
Copy link
Member Author

Aren't we having a meeting for this topic on Friday?

The intention of the meeting was more about locking enhancements per layer, see: #421

Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not yet convinced about a dedicated option, mostly because I am scared of the increasing complexity of locking and the locking paths. But also reading the commit message

"This allows use cases like listing container images while pushing an image to a remote location in parallel."

I don't completely understand why we need a dedicated option; I'd expect read-only functions (e.g., listing containers) to always acquire reader locks.

Considering #420 (comment) and the comments in this issue, I begin to think that new{RO,}LayerStore shouldn't lock at all - it doesn't add up to me to have this critical section conditionally as writer or reader since both execute the same code.

Maybe I just need a fresh coffee :^) In any case, @nalind is the expert of this code, so I want him to weigh in.

layers.go Outdated Show resolved Hide resolved
store.go Outdated Show resolved Hide resolved
@saschagrunert saschagrunert changed the title Add ReadOnlyLayers storage option WIPAdd ReadOnlyLayers storage option Dec 9, 2019
@saschagrunert saschagrunert changed the title WIPAdd ReadOnlyLayers storage option WIP: Add ReadOnlyLayers storage option Dec 9, 2019
store.go Show resolved Hide resolved
@saschagrunert saschagrunert changed the title WIP: Add ReadOnlyLayers storage option Add ReadOnlyLayers storage option Dec 11, 2019
@saschagrunert
Copy link
Member Author

Ready for a new round of review, @vrothberg @nalind @rhatdan PTAL

@vrothberg
Copy link
Member

Ready for a new round of review, @vrothberg @nalind @rhatdan PTAL

Can you explain what the code is intended to be doing now (and ideally update the commit message)? Why can't we drop the locks (see #473 (comment)) causing parts of the problem?

@saschagrunert
Copy link
Member Author

Why can't we drop the locks (see #473 (comment)) causing parts of the problem?

We can, but we change the behavior of the cleanup on init. I don't think that this is harmful and created a dedicated PR for that, since it is not directly related to this topic: #493

@rhatdan
Copy link
Member

rhatdan commented Jan 15, 2020

@saschagrunert @vrothberg @nalind Bringing this back up for discussion. Where are we on this now? Should we go forward with this?

@rhatdan
Copy link
Member

rhatdan commented Jan 31, 2020

@saschagrunert @vrothberg PING...

@vrothberg
Copy link
Member

We should reevaluate how far #493 brought us.

@saschagrunert
Copy link
Member Author

Yes some details have change but the main approach would be the same. I ping you here when this is ready for review.

@saschagrunert
Copy link
Member Author

containers/podman#4660 is now working, PTAL @rhatdan @vrothberg @nalind

Copy link
Member

@rhatdan rhatdan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rhatdan
Copy link
Member

rhatdan commented Feb 7, 2020

@vrothberg @nalind PTAL

Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really miss a detailed explanation why we need to go this path but cannot optimize reading code paths. Does setting ForceReadOnlyLayers prevent us from writing? If I set ForceReadOnlyLayers=true, I should not be able to write data. Can we give this guarantee?

It somehow feels like a symptomatic fix for a deeper issue in c/storage. Given we have a database, I expect reading accesses to proceed unless the specific data is currently being written to. Following this analogy, we are now marking the database as RO entirely because for some reason it doesn't perform well. So if I want to write to it at a later point, I had to create a new object which allows for writes without data corruption.

@rhatdan
Copy link
Member

rhatdan commented Feb 7, 2020

My understanding is, right now if to users of the storage want to use it at the same time, the second one in, needs to wait til the first is done. In the new design, if both users want to READ, then they are allowed in, and only writers would block. So a writer can not write while a reader exists. And readers can not read when a writer holds the lock.

I guess there is a chance that writers would get frozen out if a continuous stream of readers come in.

@vrothberg
Copy link
Member

My understanding is, right now if to users of the storage want to use it at the same time, the second one in, needs to wait til the first is done. In the new design, if both users want to READ, then they are allowed in, and only writers would block. So a writer can not write while a reader exists. And readers can not read when a writer holds the lock.

The main issue IMHO is that we decide the READ or WRITE access when creating the store and try to enforce that. So we are not fixing/optimizing the code paths but change the nature of the locks altogether. This only works for short-lived store objects (see https://github.com/containers/libpod/pull/4660/files) and is not generally useful for CRI-O (or the podman system service), for instance, where the store is not created for each request but is a long-lived object.

Assume I want to access the store as a reader: I have to create a new object, then do my work and subsequent writing accesses would require to create a new object. I would prefer to make c/storage smart enough to figure out when we need to acquire the locks as a reader or writer (and further localize the bottlenecks). It really feels like a symptomatic patch while the underlying bottlenecks in c/storage remain.

@rhatdan
Copy link
Member

rhatdan commented Feb 7, 2020

Ok so Nalin and I talked about this further, and the problem is each Runtime when it starts takes a Read/Write lock, even if they do not need it. So when you are pushing the image, you only have a read/lock, but when podman images (buildah images) starts it wants to create a read/write lock initially and then waits until podman push is done. Wouldn't a better solution be to take a read/only lock and then when I call into a storage method that requires read/write, the method checks if the lock is read/write and it takes it, if not it ReOpens the lock or Recreates the lock as a read/write lock and then locks it.

@vrothberg
Copy link
Member

Wouldn't a better solution be to take a read/only lock and then when I call into a storage method that requires read/write, the method checks if the lock is read/write and it takes it, if not it ReOpens the lock or Recreates the lock as a read/write lock and then locks it.

I preferred this ^ approach, yes. It's transparent to consumers of c/storage and seems to tackle the root cause.

@rhatdan
Copy link
Member

rhatdan commented Oct 30, 2020

@kolyshkin I would love to have you review this PR.

@saschagrunert
Copy link
Member Author

I'm kind of in a trial and error mode now with the Podman PR containers/podman#7603. I need the integration/system tests there to see if it works, because the issue is not reproducible on my machine.

I'll keep you posted, this can take some iterations.

@saschagrunert
Copy link
Member Author

The Podman PR as well as this one is now green, but I have to verify that my changes logically make sense…
containers/podman#7603

@rhatdan
Copy link
Member

rhatdan commented Nov 3, 2020

@saschagrunert Just to move the boudary a little further, could you try a PR against CRI-O to see how that goes.

saschagrunert added a commit to saschagrunert/cri-o that referenced this pull request Nov 3, 2020
@saschagrunert
Copy link
Member Author

@saschagrunert Just to move the boudary a little further, could you try a PR against CRI-O to see how that goes.

Yep, opened the PR there: cri-o/cri-o#4339

@rhatdan
Copy link
Member

rhatdan commented Nov 3, 2020

@vrothberg @nalind @TomSweeneyRedHat @mtrmac @mrunalp This PR is now passing tests in Podman and CRI-O, I believe it is time to give it another review.

@akostadinov
Copy link

akostadinov commented Dec 18, 2020

@saschagrunert , is your work intended to fix issues like:
containers/podman#8667

layers.go Outdated Show resolved Hide resolved
store.go Outdated Show resolved Hide resolved
store.go Outdated Show resolved Hide resolved
store.go Outdated Show resolved Hide resolved
store.go Outdated Show resolved Hide resolved
store.go Outdated Show resolved Hide resolved
Accessing layers in RO or RW mode is now decided in the dedicated method
which increases the flexibility in terms of locking.

This allows use cases like listing container images while pushing an
image to a remote location in parallel.

Signed-off-by: Sascha Grunert <[email protected]>
Signed-off-by: Sascha Grunert <[email protected]>
Copy link
Member Author

@saschagrunert saschagrunert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kolyshkin I fixed your mentioned review points. PTAL

@akostadinov
Copy link

@saschagrunert , sorry for double posting but I think it is important to know whether your PR is suppose to fix issues like containers/podman#8667

If not, then could you clarify what use cases should be considered supported?

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, as a stranger to the codebase, I’d appreciate fairly detailed comments about the design: What do the various locks protect, what is authoritative data vs. caches, and so on and so on.

  • The concurrent maintenance of two stores seems risky to me, see elsewhere. Wouldn’t it make sense to only have an one-directional “upgrade”, where the store either has a read-only layer store, or replaces it with a read-write, but never creates a read-write one again?
  • Having two stores means two caches = two reads of the data. Is that really worth it?
  • If there has to be a separate ShutdownRO and the caller is expected to call it to benefit from it (and the code breaks if the caller calls RO after updates!), this is not completely transparent anyway. How are the library callers going to use it, then? If the code that creates/shuts down the storage.Store must know whether the store is RO or RW, it would seem much cleaner to me to have that code create the storage.Store in a RO or RW mode (and then c/storage itself can be in control of its invariants, e.g. refuse to do any write operations on a RO store). (Yes, I have noticed that this was considered about a year ago; as I said at the start, I’m a stranger to the code base and my opinion should not be authoritative — I can’t really understand the CRI-O argument in there, for example.)

@@ -1470,7 +1455,8 @@ func (r *layerStore) Modified() (bool, error) {
}

func (r *layerStore) IsReadWrite() bool {
return r.lockfile.IsReadWrite()
// the mountsLockfile is only set for read-write enabled stores.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This is preexisting = non-blocking:) This looks like a important invariant; it should be documented with the structure field.

@@ -524,27 +530,6 @@ func (s *store) newLayerStore(rundir string, layerdir string, driver drivers.Dri
return &rlstore, nil
}

func newROLayerStore(rundir string, layerdir string, driver drivers.Driver) (ROLayerStore, error) {
lockfile, err := GetROLockfile(filepath.Join(layerdir, "layers.lock"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newLayerStore as a replacement for this will always create a RW lockfile. Is that desirable? It changes semantics of r.lockfile.RecursiveLock == r.RecursiveLock, which seems worrisome.

OTOH

if locker, ok := lockfiles[cleanPath]; ok {
if ro && locker.IsReadWrite() {
return nil, errors.Errorf("lock %q is not a read-only lock", cleanPath)
}
if !ro && !locker.IsReadWrite() {
return nil, errors.Errorf("lock %q is not a read-write lock", cleanPath)
}
return locker, nil
}
does not support having both kinds of lock for a single path.

@@ -1470,7 +1455,8 @@ func (r *layerStore) Modified() (bool, error) {
}

func (r *layerStore) IsReadWrite() bool {
return r.lockfile.IsReadWrite()
// the mountsLockfile is only set for read-write enabled stores.
return r.lockfile.IsReadWrite() && r.mountsLockfile != nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r.lockfile.IsReadWrite is always true now, AFAICS.

func (s *store) LayerStore() (LayerStore, error) {
s.graphLock.Lock()
defer s.graphLock.Unlock()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, this lock protected all of LayerStore, including s.getGraphDriver; now the lock is scoped to resetIfNeeded; why is that sufficient? (It might be, but it’s not at all obvious to me — the lock’s scope is not documented.)

@@ -491,7 +491,10 @@ func (r *layerStore) saveMounts() error {
return r.loadMounts()
}

func (s *store) newLayerStore(rundir string, layerdir string, driver drivers.Driver) (LayerStore, error) {
// newLayerStore creates a new layer storage based on the provided directories
// and selected driver. The `withMountsLock` bool indicates if we want to
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really withMountsLock or readWrite? The latter seems semantically more useful, and might e.g. allow skipping the MkdirAll calls below.

@@ -582,6 +593,7 @@ type store struct {
autoNsMaxSize uint32
graphDriver drivers.Driver
layerStore LayerStore
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name this rwLayerStore? Possibly document the fields, even if they are “a private cache for $method”.

@@ -582,6 +593,7 @@ type store struct {
autoNsMaxSize uint32
graphDriver drivers.Driver
layerStore LayerStore
roLayerStore ROLayerStore
roLayerStores []ROLayerStore
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppose the user calls:

  • s.Names(foo) // Populates roLayerStore, reads from disk
  • s.PutLayer(…) // Populates layerStore, reads + writes to disk
  • s.Names(foo)

What happens now? AFAICS:

  • s.graphLock.TouchedSince is false (no .Shutdown*), so the cached roLayerStore is used
  • The RO and RW layer stores share lock objects (
    if locker, ok := lockfiles[cleanPath]; ok {
    if ro && locker.IsReadWrite() {
    return nil, errors.Errorf("lock %q is not a read-only lock", cleanPath)
    }
    if !ro && !locker.IsReadWrite() {
    return nil, errors.Errorf("lock %q is not a read-write lock", cleanPath)
    }
    return locker, nil
    }
    ), so regardless of any .Touch() calls on locks, .Modified() will be false
  • The RO layer store will continue reading from its cached data, regardless of modifications by the RW store.

(This is only by reading code I’m unfamiliar with, so I may well be missing something or making a stupid mistake.)

@saschagrunert
Copy link
Member Author

@saschagrunert , sorry for double posting but I think it is important to know whether your PR is suppose to fix issues like containers/podman#8667

If not, then could you clarify what use cases should be considered supported?

The main use case was to be able to list container images while another push operation in in progress. I think containers/podman#8667 could be solved when locking by layer and not the whole storage. But as folks already mentioned, this would mean a architectural rewrite of containers/storage and I'm not sure if we will get to this point anytime soon.

@saschagrunert
Copy link
Member Author

  • The concurrent maintenance of two stores seems risky to me, see elsewhere. Wouldn’t it make sense to only have an one-directional “upgrade”, where the store either has a read-only layer store, or replaces it with a read-write, but never creates a read-write one again?

I think if we put API compatibility aside for now, then we could create a store which is ready-only locked per default. Just some operations should then be able to lock the certain parts of the store. For example, storing a layer should only RW lock that part of the store. Other operations (read-only or read-write which do not affect the WIP layers) should be still possible.

The API compatibility is the main burden here. c/storage is designed to cleanup the layers after storage shutdown, which will always lock the complete layer store for example.

  • Having two stores means two caches = two reads of the data. Is that really worth it?

No I don't think so and agree with your concern. The only reason to implement it like this is to stay compatible to current consumers of this project.

  • If there has to be a separate ShutdownRO and the caller is expected to call it to benefit from it (and the code breaks if the caller calls RO after updates!), this is not completely transparent anyway. How are the library callers going to use it, then?

Manually, library consumers have to decide on their own if they want to shutdown with cleanup or without. For example a podman ps would not have a need for a layer cleanup. I know that is a major drawback of the proposal.


All in all I think we will have a hard time keeping the current API stable while improving the locking. We could probably start with a parallel implementation (which breaks the API) and integrate the enhancements mentioned by @mtrmac.

If we speak only about refactoring the layer store, then the main strategy would be:

  1. Redesign a locker which is able to lock a single layer
  2. Use only one layer store for c/storage which is using the new locking (read-only per default)
  3. Decide which operations will automatically and partially rw lock the layer store
  4. Cleanup the store based on its usage (aka track with a flag if the store needs cleanup).

What do you think?

@vrothberg
Copy link
Member

Redesign a locker which is able to lock a single layer

Fine-grained locking is very difficult to get right and a common source of concurrency bugs. Having a per-layer lock, for instance, is subject to ABBA dead locks (assuming a process is allowed to lock multiple layers at once).

One thing we must consider as well is backward compatibility. If we fundamentally change how c/storage works, older and newer versions cannot run on the same machine.

I prefer to keep the coarse-grained locks unless there is no alternative.

There may still be more space for optimization (see #420 (comment)) which does not entirely rule out lock monopolization.

@mtrmac
Copy link
Collaborator

mtrmac commented Jan 26, 2021

I think if we put API compatibility aside for now, then we could create a store which is ready-only locked per default. Just some operations should then be able to lock the certain parts of the store. For example, storing a layer should only RW lock that part of the store. Other operations (read-only or read-write which do not affect the WIP layers) should be still possible.

What happens with the current consistency fix-ups that write data on load (see shouldSave in layerStore.Load and imageStore.Load at least)? Assuming we don’t somehow modify the code to transparently read inconsistent data everywhere (which might be possible, but seems difficult to make reliable), any operation can obtain a write lock even if the caller “just wants to read something”. That suggests to me that all of this probably could be made API-transparent: keep the existing API, and just lock only for reading if necessary…

… except that, well, that’s what the code is already doing: store.go calls store.RLock() for read-only operations.

So… what’s the actual failure path this PR is trying to address? That layerStore.Load() uses r.lockfile.Lock(), which is read-write, and that always blocks on a concurrent push? Couldn’t that be a fairly local fix, where layerStore.Load() tries to load with a RO lock only, and on consistency failure it gives that lock up, and tries again with RW lock?

@vrothberg
Copy link
Member

I opened containers/image#1167 which is tackling the issue with a different perspective.

Ultimately, solving the issue in storage is favorable but as shown here, it's very difficult. If callers need to worry about executions paths reading or writing data is likely causing future regressions as it's hard to get right.

containers/image#1167 makes pushing a bit slower but it solves the problem of being network bound until unlocking the storage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow GetStore() in read-only mode
9 participants