Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libct/cg: openat2 optimization not working across container boundaries #3026

Open
MikeDombo opened this issue Jun 13, 2021 · 9 comments
Open

Comments

@MikeDombo
Copy link

Hi,
I'm using libcontainer to create my container and I found a regression when Ubuntu updated from kernel 5.4 to kernel 5.8 on Thursday/Friday. I'm currently seeing process_linux.go:385: applying cgroup configuration for process caused: openat2 /sys/fs/cgroup/cpuset/<container id>/cpuset.cpus: no such file or directory. On kernel 5.4 using the exact same code it work fine. I edited libcontainer to do a directory listing and I can verify that the cpuset.cpus "file" does exist as it should, however it is unable to be opened by libcontainer. I am able to run cat on the path without error from outside the container.

Any pointers would be incredibly helpful, thank you very much.

@cyphar
Copy link
Member

cyphar commented Jun 13, 2021

It seems that the issue is that openat2 is available after the update (openat2 was added in Linux 5.6). However this code definitely works on both old and new kernels, so it's a bit puzzling why it's failing for you. Does your code mess around with open files at all? With openat2, we cache a file descriptor to /sys/fs/cgroup -- so if some of your code is dup-ing over that file descriptor with another file descriptor, you're going to end up with the cgroup2 code trying to open /some/random/file/cpuset/container-id/cpuset.cpus which will either fail or cause security issues.

(As an aside @kolyshkin I think that the error message in the openat2 code should be changed to better describe the path being opened -- readlink of the fd would be a good idea. I'll write a patch for that.)

@cyphar
Copy link
Member

cyphar commented Jun 13, 2021

Hi, it seems that you've opened an issue that is in relation to using github.com/opencontainers/runc/libcontainer directly from your own Go code. While we may do our best to help you solve your issue, we do not support this. In addition, our Semantic Versioning version does not include the Go API in its scope -- meaning we are free to break the Go APIs in point releases.

The reason for this policy is that runc has fairly complicated behaviour that is shared between the runc command-line code and the internal libcontainer libraries, as well as a fairly complicated lifecycle that is very easy to not configure correctly. In addition, we have several security hardenings within runc that could be broken inadvertently by code using libcontainer and not being aware of this.

libcontainer being a public-accessible library is a historical accident and in the future we may put it entirely inside an internal package to stop it from being imported entirely. If you can use runc, or the go-runc library (which wraps the runc binary) please do so. If you cannot, please open an issue describing why you cannot use runc and we will see if it's possible to expand runc so that you can use runc directly.

NOTE: This is a saved reply. Sorry if it reads as a cookie-cutter response, it was written to explain the situation around libcontainer. If you are already aware of this, and are fine accepting the risks of using libcontainer directly, you can ignore this message.

@MikeDombo
Copy link
Author

Hi @cyphar thank you very much for both of the replies.

So to test the hypothesis I changed prepareOpenat2 to always return an error so that we'd just use good ol' open and that actually solved the problem!

I don't believe we're messing around with anything, duping or otherwise, aside from running the container root in an overlayfs which is maybe related.

@cyphar
Copy link
Member

cyphar commented Jun 13, 2021

What happens if you apply this patch?

diff --git a/libcontainer/cgroups/fscommon/open.go b/libcontainer/cgroups/fscommon/open.go
index e95876a21769..2ef1e15cfae8 100644
--- a/libcontainer/cgroups/fscommon/open.go
+++ b/libcontainer/cgroups/fscommon/open.go
@@ -1,6 +1,7 @@
 package fscommon

 import (
+       "fmt"
        "os"
        "strings"
        "sync"
@@ -86,7 +87,13 @@ func OpenFile(dir, file string, flags int) (*os.File, error) {
                        Mode:    uint64(mode),
                })
        if err != nil {
-               return nil, &os.PathError{Op: "openat2", Path: dir + "/" + file, Err: err}
+               procpath := fmt.Sprintf("/proc/self/fd/%d", cgroupFd)
+               realdir, _ := os.Readlink(procpath)
+               path := dir + "/" + file
+               if realdir != dir {
+                       path = fmt.Sprintf("[%s=%d!=%s]/%s", procpath, realdir, dir, file)
+               }
+               return nil, &os.PathError{Op: "openat2", Path: path, Err: err}
        }

        return os.NewFile(uintptr(fd), cgroupfsPrefix+relname), nil

@MikeDombo
Copy link
Author

[/proc/self/fd/30=/!=/sys/fs/cgroup/cpuset/9ac5cec4-16ac-4fdf-4f7f-51aa982a9b7f]/cpuset.cpus: no such file or directory

So apparently what should be the cgroup path is actually /

@cyphar
Copy link
Member

cyphar commented Jun 13, 2021

Hmm, so it seems that fd 30 is being swapped with a handle to / or something like that? You can try to add some similar readlink-based debugging code to prepareOpenat2 to see whether the original handle is correct.

@MikeDombo
Copy link
Author

Thank you so much Aleksa, I don't want to take up more of your valuable time with this. Your pointer to look at the openat2 call certainly saved hours of work, so thank you for that! I'll keep looking into it, but worst case we just use the fallback path without openat2 which seems like it should be fine.

I'd suspect the FD is getting messed up either in entering the mount namespace or mounting the overlayfs.

@kolyshkin kolyshkin reopened this Jun 14, 2021
@kolyshkin
Copy link
Contributor

I'd suspect the FD is getting messed up either in entering the mount namespace or mounting the overlayfs.

Apparently we did not have such a use case in mind. runc works with cgroup files before entering any namespaces.

I guess we can check on error if cgroupFd still resolves to /sys/fs/cgroup, and reopen it if not, but that seems overly complicated. I actually wrote this but it looks borderline ugly.

@MikeDombo If the source code is available, I will be happy to help resolve this.

@MikeDombo
Copy link
Author

Thank you for looking Kir. Sadly our usage isn't open source at this time. Hopefully when we do open source it we'll have moved to using runc properly instead of going directly to libcontainer.

@kolyshkin kolyshkin changed the title v1.0.0rc95 ENOENT when launching container looking for cpuset.cpus with kernel 5.8 libct/cg: openat2 optimization not working across container boundaries Jun 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants