Allow `GetStore()` in read-only mode #420

saschagrunert · 2019-08-22T09:49:40Z

The idea is to retrieve an instance of the store in read-only mode, which for sure does not allow any modifications at all.

This would reduce the locking on the store where it is not necessary, for example when listing images.

Right now, it is not possible to execute two read-only operations like this:

> podman push localhost:5000/my-immense-huge-image
[takes time]

and in a second shell:

> podman images

Target of the implementation would be that the consumers of the API can decide if they want a read-only store instance or not.

The text was updated successfully, but these errors were encountered:

vrothberg · 2019-08-22T10:02:43Z

Right now, it is not possible to execute two read-only operations like this:

That should work - at least for containers/storage. In this specific example, when listing images, the image-storage is used read-only (see https://github.com/containers/storage/blob/master/store.go#L2810). The locks are then acquired read-only to allow for multi-reader, single-writer scenarios.

Did you strace which lock we're blocked on in the upper example?

vrothberg · 2019-08-22T10:10:44Z

I can reproduce it locally as well. Maybe, we miss a reader lock in one code path in that scenario :( Great catch btw, @saschagrunert !

saschagrunert · 2019-08-22T10:14:28Z

Right now, it is not possible to execute two read-only operations like this:

That should work - at least for containers/storage. In this specific example, when listing images, the image-storage is used read-only (see https://github.com/containers/storage/blob/master/store.go#L2810). The locks are then acquired read-only to allow for multi-reader, single-writer scenarios.

Did you strace which lock we're blocked on in the upper example?

Yeah, in the above scenario it blocks there in containers/storage:

storage/layers.go

Line 485 in 69bcd96

lockfile.Lock()

Which traces up to libpod configureStore():
https://github.com/containers/libpod/blob/master/libpod/runtime.go#L1394

The internal load() function of the storage already required read access and therefore locks:

storage/store.go

Lines 670 to 673 in 69bcd96

    
           rls, err := s.LayerStore() 
        
           if err != nil { 
        
           	return err 
        
           }

This also effects every consumer of the GetStore() API (found such behavior right now in CRI-O, buildah and podman), which means that scenarios like concurrent push are not possible too.

vrothberg · 2019-08-22T11:22:30Z

Yeah, in the above scenario it blocks there in containers/storage:

If we did a newROLayerStore(...) we would acquire a RO store and use reader-locks which should unblock the podman images process.

The internal load() function of the storage already required read access and therefore locks:

The impact of this specific lock should be minimal. It's a short critical section needed to initialize the storage on disk. I don't think we can get rid of them.

vrothberg · 2019-08-22T11:23:14Z

@nalind PTAL

vrothberg · 2019-08-22T11:30:02Z

If we did a newROLayerStore(...) we would acquire a RO store and use reader-locks which should unblock the podman images process.

I'll take that back. The layers.lock may be held during the push entirely. (need to have a closer look)

vrothberg · 2019-08-22T12:18:01Z

I am under the impression that the pushing process monopolizes the lock where @saschagrunert pointed to (i.e., the layers.lock in newLayerStore(...)) which prevents the list process from acquiring it. It's very short lived but still monopolized.

Changing it to an RLock() makes it run in parallel. @nalind, is there any way we can avoid the writer lock here? LayerStores are often-times used as ROLayerStores but it seems to be called too often, which I assume leads to the lock monopolization (especially due to the parallel copy-go-routines).

saschagrunert · 2019-08-22T12:28:19Z

I am under the impression that the pushing process monopolizes the lock where @saschagrunert pointed to (i.e., the layers.lock in newLayerStore(...)) which prevents the list process from acquiring it. It's very short lived but still monopolized.

What do you mean with short lived? If I push then it blocks during the whole image push (which can be pretty long). During pull, it only blocks on "Storing signatures", which can also take longer time if the target image is large.

vrothberg · 2019-08-22T12:38:21Z

What do you mean with short lived?

Looking at

storage/layers.go

Line 485 in 69bcd96

lockfile.Lock()

, the lock is unblocked at return of newLayerStore. That's why short-lived; the critical section is small.

If I push then it blocks during the whole image push (which can be pretty long).

I don't understand how this lock could be held entirely as it's being released at return of this function (see above) and acquired multiple times. The lock monopolization, however, would make it seem like it's being held for the entire push.

During pull, it only blocks on "Storing signatures", which can also take longer time if the target image is large.

Right. The blobs are first downloaded in /tmp (by c/image) and then copied to the storage during the commit phase ("Storing signatures" etc.).

vrothberg · 2019-08-22T12:58:28Z

During a push of docker.io/library/gcc:latest, strace shows 16 openats on the layers.lock. Since the copy operation runs in parallel goroutines, the writer-locks within that process-space (at least in my theory) will get that lock in sequence and monopolize the newLayerStore().

And I get 75 when listing images 👀

saschagrunert · 2019-08-26T13:55:27Z

I took more time of investigating how the storage locking behaves and could not find any wrong behavior.

For example when pushing an image and parallel trying to list the images, then the push read-locks the storage for every layer, which is correct. The main issue I can see is that we need write-lock access to the storage on Load() and Shutdown(), which will be blocking for read-only operations (like listing images), too.

If we hack a bit around and replace both locks with an RLock() only, then the mentioned parallel push and list images scenario works seamlessly:

diff --git a/vendor/github.com/containers/storage/layers.go b/vendor/github.com/containers/storage/layers.go
index d746ba06..46b0f099 100644
--- a/vendor/github.com/containers/storage/layers.go
+++ b/vendor/github.com/containers/storage/layers.go
@@ -482,7 +482,7 @@ func newLayerStore(rundir string, layerdir string, driver drivers.Driver, uidMap
        if err != nil {
                return nil, err
        }
-       lockfile.Lock()
+       lockfile.RLock()
        defer lockfile.Unlock()
        mountsLockfile, err := GetLockfile(filepath.Join(rundir, "mountpoints.lock"))
        if err != nil {
diff --git a/vendor/github.com/containers/storage/store.go b/vendor/github.com/containers/storage/store.go
index 74275482..66290b53 100644
--- a/vendor/github.com/containers/storage/store.go
+++ b/vendor/github.com/containers/storage/store.go
@@ -3130,7 +3130,7 @@ func (s *store) Shutdown(force bool) ([]string, error) {
        s.graphLock.Lock()
        defer s.graphLock.Unlock()

-       rlstore.Lock()
+       rlstore.RLock()
        defer rlstore.Unlock()
        if modified, err := rlstore.Modified(); modified || err != nil {
                if err = rlstore.Load(); err != nil {

This is for sure not safe, because we're now mixing two use cases. What we could do is to provide a completely read-only store via an API, but then the consumer has to specify which type (read, readWrite) of store is needed...

This means that we either extend the API, or we do not write the store on Load() and Shutdown() generally. 🤷‍♂️

vrothberg · 2019-08-26T14:03:51Z

That aligns with my findings in #420 (comment).

This means that we either extend the API, or we do not write the store on Load() and Shutdown() generally. man_shrugging

I have hopes that we can find a work-around to avoid acquiring the lock(s) as a writer but we need @nalind's pair of eyes to be sure.

rhatdan · 2019-11-01T09:01:36Z

@nalind @vrothberg @saschagrunert Anyone have time to pursue this?

saschagrunert · 2019-11-01T10:34:00Z

Yes, but we're still waiting for feedback from @nalind :)

nalind · 2019-11-05T18:31:57Z

I'm fine with that if it does the job without dramatically increasing the complexity of the code.

saschagrunert · 2019-11-06T08:53:33Z

Alright, I will put this on my list of TODOs but may start after kubecon

rhatdan · 2020-08-03T13:34:32Z

@saschagrunert Any movement on this? Or still waiting for #473 to get merged?

saschagrunert · 2020-08-04T06:42:08Z

Yes I think #473 is a per-requirement for this more in-depth enhancement :) Once we can merge this I can continue my work here.

rhatdan · 2021-06-14T14:30:23Z

@saschagrunert What should we do with this one now?

saschagrunert · 2021-06-14T14:32:12Z

Closing this since I don't see a valid path forward.

saschagrunert mentioned this issue Aug 22, 2019

RFC: Partial storage locking to increase 'storing signature' performance #421

Open

This was referenced Nov 28, 2019

Split RW and RO Layer locks per method #473

Closed

Rootless runs execute storage.GetStore() twice (locking issue) containers/podman#4591

Closed

rhatdan added the kind/feature label Oct 7, 2020

meilier mentioned this issue Jun 7, 2021

copy big files' performance buildah vs docker containers/buildah#3256

Closed

saschagrunert closed this as completed Jun 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow `GetStore()` in read-only mode #420

Allow `GetStore()` in read-only mode #420

saschagrunert commented Aug 22, 2019

vrothberg commented Aug 22, 2019

vrothberg commented Aug 22, 2019

saschagrunert commented Aug 22, 2019 •

edited

Loading

vrothberg commented Aug 22, 2019

vrothberg commented Aug 22, 2019

vrothberg commented Aug 22, 2019

vrothberg commented Aug 22, 2019

saschagrunert commented Aug 22, 2019

vrothberg commented Aug 22, 2019

vrothberg commented Aug 22, 2019

saschagrunert commented Aug 26, 2019

vrothberg commented Aug 26, 2019

rhatdan commented Nov 1, 2019

saschagrunert commented Nov 1, 2019

nalind commented Nov 5, 2019

saschagrunert commented Nov 6, 2019

rhatdan commented Aug 3, 2020

saschagrunert commented Aug 4, 2020 •

edited

Loading

rhatdan commented Jun 14, 2021

saschagrunert commented Jun 14, 2021

Allow GetStore() in read-only mode #420

Allow GetStore() in read-only mode #420

Comments

saschagrunert commented Aug 22, 2019

vrothberg commented Aug 22, 2019

vrothberg commented Aug 22, 2019

saschagrunert commented Aug 22, 2019 • edited Loading

vrothberg commented Aug 22, 2019

vrothberg commented Aug 22, 2019

vrothberg commented Aug 22, 2019

vrothberg commented Aug 22, 2019

saschagrunert commented Aug 22, 2019

vrothberg commented Aug 22, 2019

vrothberg commented Aug 22, 2019

saschagrunert commented Aug 26, 2019

vrothberg commented Aug 26, 2019

rhatdan commented Nov 1, 2019

saschagrunert commented Nov 1, 2019

nalind commented Nov 5, 2019

saschagrunert commented Nov 6, 2019

rhatdan commented Aug 3, 2020

saschagrunert commented Aug 4, 2020 • edited Loading

rhatdan commented Jun 14, 2021

saschagrunert commented Jun 14, 2021

Allow `GetStore()` in read-only mode #420

Allow `GetStore()` in read-only mode #420

saschagrunert commented Aug 22, 2019 •

edited

Loading

saschagrunert commented Aug 4, 2020 •

edited

Loading