Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regression in v1.23.1 #697

Closed
vrothberg opened this issue Aug 20, 2020 · 10 comments · Fixed by #698
Closed

regression in v1.23.1 #697

vrothberg opened this issue Aug 20, 2020 · 10 comments · Fixed by #698

Comments

@vrothberg
Copy link
Member

vrothberg commented Aug 20, 2020

v1.23.1 is causing regressions in Podman's apiv2 tests. I do not have the time to track it down but need to get work done that's currently blocked by the regression so I first revert the bump in c/image (see containers/image#1031).

[+0064s] not ok 291 [40-pods] POST libpod/pods/bar/start [] : status
[+0064s] #  expected: 200
[+0064s] #    actual: 500
[+0064s]   expected: 200
@vrothberg
Copy link
Member Author

It looks like commit 7005d07 causes it.

@vrothberg
Copy link
Member Author

$ POST libpod/pods/bar/start []                                          
HTTP/1.1 500 Internal Server Error                                       
Content-Type: application/json                                           
Date: Thu, 20 Aug 2020 12:11:03 GMT                                      
Content-Length: 117                                                      
                                                                         
{                                                                        
  "cause": "some containers failed",                                     
  "message": "error starting some containers: some containers failed",   
  "response": 500                                                        
}                                                                        
not ok 277 [40-pods] POST libpod/pods/bar/start [] : status              

@vrothberg
Copy link
Member Author

What I am seeing on the server side:

time="2020-08-20T14:27:33+02:00" level=debug msg="Received: -1"
time="2020-08-20T14:27:33+02:00" level=debug msg="Cleaning up container c6c60358e96e7ebf8ce7953644fef5a2bbb10ddeac9fd9c1b019162f59b51bc5"
``

@vrothberg
Copy link
Member Author

@giuseppe @nalind @rhatdan ... no idea what's going on here. The only thing I noticed is that the error only occurs when we're locking the graphlock before the layer store. If we lock after, CI passes again.

@rhatdan
Copy link
Member

rhatdan commented Aug 20, 2020

@zvier PTAL

@vrothberg
Copy link
Member Author

Note that it might very well be a bug in Podman. At least we cannot rule that out.

@nalind
Copy link
Member

nalind commented Aug 20, 2020

@giuseppe @nalind @rhatdan ... no idea what's going on here. The only thing I noticed is that the error only occurs when we're locking the graphlock before the layer store. If we lock after, CI passes again.

That differs from the order in which they're acquired in Shutdown(), which would be a potential lock ordering problem.

@giuseppe
Copy link
Member

I don't think there is a regression in v1.23.1, IMO it only showed a race condition that already existed.

I think the issue is that podman performs the unmount from the cleanup process, and that confuses the mounts RefCounter. So next time the podman daemon tries to do a mount, nothing happens because the RefCounter believes it is already mounted, while in reality it was unmounted by the cleanup process.

@giuseppe
Copy link
Member

I've tried with:

diff --git a/vendor/github.com/containers/storage/drivers/counter.go b/vendor/github.com/containers/storage/drivers/counter.go
index 72551a38d..ec7a6090a 100644
--- a/vendor/github.com/containers/storage/drivers/counter.go
+++ b/vendor/github.com/containers/storage/drivers/counter.go
@@ -51,6 +51,8 @@ func (c *RefCounter) incdec(path string, infoOp func(minfo *minfo)) int {
                if c.checker.IsMounted(path) {
                        m.count++
                }
+       } else if !c.checker.IsMounted(path) {
+               m.count = 0
        }
        infoOp(m)
        count := m.count

Not sure if it is the best fix, but at least the tests pass now.

giuseppe added a commit to giuseppe/storage that referenced this issue Aug 21, 2020
if a previously mounted container was unmounted
externally (e.g. through conmon cleanup for Podman containers), the
ref counter will lose track of it and report it as still mounted.

Closes: containers#697

Signed-off-by: Giuseppe Scrivano <[email protected]>
@giuseppe
Copy link
Member

PR here: #698

mheon pushed a commit to mheon/storage that referenced this issue Aug 24, 2020
if a previously mounted container was unmounted
externally (e.g. through conmon cleanup for Podman containers), the
ref counter will lose track of it and report it as still mounted.

Closes: containers#697

Signed-off-by: Giuseppe Scrivano <[email protected]>
rhatdan pushed a commit to rhatdan/storage that referenced this issue Aug 25, 2020
if a previously mounted container was unmounted
externally (e.g. through conmon cleanup for Podman containers), the
ref counter will lose track of it and report it as still mounted.

Closes: containers#697

Signed-off-by: Giuseppe Scrivano <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants