Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can't start nginx image with runc and user namespaces (works with crun) #4475

Open
rata opened this issue Oct 25, 2024 · 16 comments · May be fixed by #4478
Open

can't start nginx image with runc and user namespaces (works with crun) #4475

rata opened this issue Oct 25, 2024 · 16 comments · May be fixed by #4478

Comments

@rata
Copy link
Member

rata commented Oct 25, 2024

Description

Trying to start a Kubernetes container with userns using the nginx official image, fails. This was reported here: containerd/containerd#10598 by @ctrox.

@ctrox also found a workaround: add "tty: true" to the kubernetes pod makes it work.

And a simpler repro: just a container with userns that runs "cat /dev/stderr" also fails with permission denied.

I guess you need to run detached (as containerd does) to hit this, otherwise it uses your shell and that probably works.

@ctrox thanks for the great bug report!

Sorry the brevity, I'm sick ATM. I'll add more info when I recover

Steps to reproduce the issue

No response

Describe the results you received and expected

Works, as without user namespaces.

What version of runc are you using?

runc 1.2.0

Host OS information

No response

Host kernel information

No response

@dims
Copy link
Contributor

dims commented Oct 25, 2024

xref: kubernetes/kubernetes#128276

@kolyshkin
Copy link
Contributor

@ctrox could you please try v1.2.0 + PR #4477? If you need a static binary, you can go to that PR, click on Checks tab, click on 'validate' in the left column, scroll all the way down and you'll see a tarball with binaries under "Artifacts".

@lifubang
Copy link
Member

Could you confirm that it can works with crun?
I have a test case to reproduce it, but it also can't works with crun.

root@iZj6cgggwb62cxurec74geZ:/opt/bb# crun/crun run -d test
cat: can't open '/dev/stderr': Permission denied
root@iZj6cgggwb62cxurec74geZ:/opt/bb# crun/crun delete test
root@iZj6cgggwb62cxurec74geZ:/opt/bb# /root/go/src/github.com/opencontainers/runc/runc run -d test
cat: can't open '/dev/stderr': Permission denied
root@iZj6cgggwb62cxurec74geZ:/opt/bb# /root/go/src/github.com/opencontainers/runc/runc delete test
root@iZj6cgggwb62cxurec74geZ:/opt/bb# 

Maybe the core reason is that the permission of the /proc/self/fd/2 is 300? It should be 700 in the host. But I can't confirm.
The content of ls /proc/self/fd in the container looks like:

   dr-x------    2 root     root           4 Oct 26 01:10 .
   dr-xr-xr-x    9 root     root           0 Oct 26 01:10 ..
   lr-x------    1 root     root          64 Oct 26 01:10 0 -> pipe:[2264321]
   l-wx------    1 root     root          64 Oct 26 01:10 1 -> pipe:[2264322]
   l-wx------    1 root     root          64 Oct 26 01:10 2 -> pipe:[2264323]
   lr-x------    1 root     root          64 Oct 26 01:10 3
   ls: /proc/self/fd/3: cannot read link: No such file or directory

But in the host:

dr-x------ 2 root root  4 Oct 26 09:28 .
dr-xr-xr-x 9 root root  0 Oct 26 09:28 ..
lrwx------ 1 root root 64 Oct 26 09:28 0 -> /dev/pts/0
lrwx------ 1 root root 64 Oct 26 09:28 1 -> /dev/pts/0
lrwx------ 1 root root 64 Oct 26 09:28 2 -> /dev/pts/0
lr-x------ 1 root root 64 Oct 26 09:28 3 -> /proc/629444/fd

The content of config.json:

{ "ociVersion": "1.0.2-dev", "process": { "terminal": false, "user": { "uid": 0, "gid": 0 }, "args": [ "cat", "/dev/stderr" ], "env": [ "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", "TERM=xterm" ], "cwd": "/", "capabilities": { "bounding": [ "CAP_AUDIT_WRITE", "CAP_KILL", "CAP_NET_BIND_SERVICE" ], "effective": [ "CAP_AUDIT_WRITE", "CAP_KILL", "CAP_NET_BIND_SERVICE" ], "permitted": [ "CAP_AUDIT_WRITE", "CAP_KILL", "CAP_NET_BIND_SERVICE" ], "ambient": [ "CAP_AUDIT_WRITE", "CAP_KILL", "CAP_NET_BIND_SERVICE" ] }, "rlimits": [ { "type": "RLIMIT_NOFILE", "hard": 1024, "soft": 1024 } ], "noNewPrivileges": true }, "root": { "path": "rootfs", "readonly": true }, "hostname": "runc", "mounts": [ { "destination": "/proc", "type": "proc", "source": "proc" }, { "destination": "/dev", "type": "tmpfs", "source": "tmpfs", "options": [ "nosuid", "strictatime", "mode=755", "size=65536k" ] }, { "destination": "/dev/pts", "type": "devpts", "source": "devpts", "options": [ "nosuid", "noexec", "newinstance", "ptmxmode=0666", "mode=0620", "gid=5" ] }, { "destination": "/dev/shm", "type": "tmpfs", "source": "shm", "options": [ "nosuid", "noexec", "nodev", "mode=1777", "size=65536k" ] }, { "destination": "/dev/mqueue", "type": "mqueue", "source": "mqueue", "options": [ "nosuid", "noexec", "nodev" ] }, { "destination": "/sys", "type": "sysfs", "source": "sysfs", "options": [ "nosuid", "noexec", "nodev", "ro" ] }, { "destination": "/sys/fs/cgroup", "type": "cgroup", "source": "cgroup", "options": [ "nosuid", "noexec", "nodev", "relatime", "ro" ] } ], "linux": { "resources": { "devices": [ { "allow": false, "access": "rwm" } ] }, "namespaces": [ { "type": "pid" }, { "type": "network" }, { "type": "ipc" }, { "type": "uts" }, { "type": "mount" }, { "type": "cgroup" }, { "type": "user" } ], "uidMappings": [{"hostID": 100000, "containerID": 0, "size": 65534}], "gidMappings": [{"hostID": 100000, "containerID": 0, "size": 65534}], "maskedPaths": [ "/proc/acpi", "/proc/asound", "/proc/kcore", "/proc/keys", "/proc/latency_stats", "/proc/timer_list", "/proc/timer_stats", "/proc/sched_debug", "/sys/firmware", "/proc/scsi" ], "readonlyPaths": [ "/proc/bus", "/proc/fs", "/proc/irq", "/proc/sys", "/proc/sysrq-trigger" ] } }

lifubang added a commit to lifubang/runc that referenced this issue Oct 26, 2024
We should let stdio could be accessed in user ns container.
Please see opencontainers#4475
Because the default permission of stdio is 0o700, other user can't access
them. If we don't change the permission to 0o666, We'll get an error msg if
we access stdio in a userns contaienr: ***: /dev/std***: Permission denied.

Signed-off-by: lifubang <[email protected]>
lifubang added a commit to lifubang/runc that referenced this issue Oct 26, 2024
We should let stdio could be accessed in userns container.
Please see opencontainers#4475
Because the default permission of stdio is 0o700, other user can't access
them. If we don't change the permission to 0o666, We'll get an error msg if
we access stdio in a userns contaienr: ***: /dev/std***: Permission denied.

Signed-off-by: lifubang <[email protected]>
@lifubang lifubang linked a pull request Oct 26, 2024 that will close this issue
lifubang added a commit to lifubang/runc that referenced this issue Oct 26, 2024
We should let stdio could be accessed in userns container.
Please see opencontainers#4475
Because the default permission of stdio is 0o700, other user can't access
them. If we don't change the permission to 0o666, We'll get an error msg if
we access stdio in a userns contaienr: ***: /dev/std***: Permission denied.

Signed-off-by: lifubang <[email protected]>
@lifubang
Copy link
Member

lifubang commented Oct 26, 2024

Maybe the core reason is that the permission of the /proc/self/fd/2 is 300? It should be 700 in the host. But I can't confirm.

This is not the real reason, the core reason is that the permission of /proc/self/fd/0-2 in the host is 0o700, they can't be accessed by other user, we should change it to 0o666.

@lifubang
Copy link
Member

@ctrox could you please try v1.2.0 + PR #4477?

It still have the issue.
@ctrox @rata Could you help to see if #4478 has fixed your issue or not?

lifubang added a commit to lifubang/runc that referenced this issue Oct 26, 2024
We should let stdio could be accessed in userns container.
Please see opencontainers#4475
Because the default permission of stdio is 0o700, other user can't access
them. If we don't change the permission to 0o666, We'll get an error msg if
we access stdio in a userns contaienr: ***: /dev/std***: Permission denied.

Signed-off-by: lifubang <[email protected]>
@kolyshkin
Copy link
Contributor

I think we need to go back to square one with this.

What we know now is

  • there's a scenario where kubernetes v1.30 and containerd are used, and a process inside a container can't write to a file which is a symlink to /dev/stderr when runc-1.2.0-rc.3 (or rc.2 -- it's not entirely clear) is used;
  • this happens if terminal is disabled, userns is enabled, and a non-root user inside a container;
  • the same setup apparently works with crun;
  • the reporter (@ctrox) was unable to reproduce the issue without kubernetes (only using containerd and crictl);
  • neither containerd nor kubernetes have a test case for this issue.

What we don't know:

  • were anybody else able to reproduce this;
  • how to reproduce this without kubernetes (and, ideally, without containerd);
  • whether some older version of runc works.

I think we need to have some sort of a reproducer first.

@cyphar
Copy link
Member

cyphar commented Oct 28, 2024

It should also be noted that file passthrough for stdio is something folks should avoid using if possible. It's only really useful if you are using runc directly from shell scripts and don't want to use runc as a stdio forwarder (maybe you want to give a container a proper terminal or a network socket as stdio).

Within the context of containerd and Kubernetes it makes very little sense to use this mode. Using consoles improves container isolation and all of these higher-level tools have built-in support for handling consoles.

@fuweid
Copy link
Member

fuweid commented Oct 28, 2024

It seems that if TTY is disable, containerd-shim will create pipe and pass through writable fd to runc. And then init process inherits the writable fd as stderr or stdout. For the init process, it can write data to it. However, the inode is belong to the root user. When init process is trying to open it, the init process doesn't have permission about the pipe.

 → do_open
            → inode_permission
                → generic_permission
                    ↔ make_vfsuid      [0]                     0.500us
                    ↔ make_vfsuid      [0]                     6.501us
                    ↔ from_kuid        [0xffffffff]            0.700us
                ← generic_permission   [-EACCES]              13.501us

However, the tty fd is generated by runc init within that user namespace. So nginx container is able to open the file.
Maybe containerd-shim should create pipe in that user namespace.

cc @rata

@lifubang
Copy link
Member

Maybe containerd-shim should create pipe in that user namespace.

Yes, I think so. If the target link of stderr/stdout in runc is owned by the user namespace, it will be no this issue.

@kolyshkin
Copy link
Contributor

One other question (I forgot) is why this works with crun. Afaik it uses the same shim as for runc.

@lifubang
Copy link
Member

One other question (I forgot) is why this works with crun. Afaik it uses the same shim as for runc.

I see the crun implementation, it uses Fchown to change the owner.

@lifubang
Copy link
Member

Do you think we should change chmod to chown in #4478? Because if this will be fixed in containerd, runc also should fix this issue for runc direct user?

@kolyshkin
Copy link
Contributor

One other question (I forgot) is why this works with crun. Afaik it uses the same shim as for runc.

I see the crun implementation, it uses Fchown to change the owner.

This was done in containers/crun#755 to fix containers/crun#1019.

I tried the repro from there with current runc head, it works:

podman run --rm --userns=auto:size=65534 --runtime=`pwd`/runc alpine sh -c 'echo hello >> /dev/stdout'
hello
[kir@kir-tp1 runc]$ ./runc --version
runc version 1.2.0+dev
commit: v1.2.0-22-g4ad9f7fd
spec: 1.2.0
go: go1.23.2
libseccomp: 2.5.5

So it's not it I guess.

@lifubang
Copy link
Member

I tried the repro from there with current runc head, it works:

I guess you run the crun first, and then test runc in the same terminal. Because the stdio's owner has been changed by crun in this terminal.

lifubang added a commit to lifubang/runc that referenced this issue Oct 28, 2024
If the root in the container is different from current root user, we
need to change the owner of stdio before we enter the user namespace,
or else we may can't access stdio in the container.
Please see opencontainers#4475

Signed-off-by: lifubang <[email protected]>
lifubang added a commit to lifubang/runc that referenced this issue Oct 28, 2024
If the root in the container is different from current root user, we
need to change the owner of stdio before we enter the user namespace,
or else we may can't access stdio in the container.
Please see opencontainers#4475

Signed-off-by: lifubang <[email protected]>
lifubang added a commit to lifubang/runc that referenced this issue Oct 28, 2024
If the root in the container is different from current root user, we
need to change the owner of stdio before we enter the user namespace,
or else we may can't access stdio in the container.
Please see opencontainers#4475

Signed-off-by: lifubang <[email protected]>
lifubang added a commit to lifubang/runc that referenced this issue Oct 28, 2024
If the root in the container is different from current root user, we
need to change the owner of stdio before we enter the user namespace,
or else we may can't access stdio in the container.
Please see opencontainers#4475

Signed-off-by: lifubang <[email protected]>
@lifubang lifubang removed this from the 1.2.1 milestone Oct 28, 2024
lifubang added a commit to lifubang/runc that referenced this issue Oct 29, 2024
If the user in the container is different from current user, we need
to change the owner of stdio before we enter the user namespace, or
else we may can't access stdio in the container.
Please see opencontainers#4475

Signed-off-by: lifubang <[email protected]>
lifubang added a commit to lifubang/runc that referenced this issue Oct 29, 2024
If the user in the container is different from current user, we need
to change the owner of stdio before we enter the user namespace, or
else we may can't access stdio in the container.
Please see opencontainers#4475

Signed-off-by: lifubang <[email protected]>
lifubang added a commit to lifubang/runc that referenced this issue Oct 29, 2024
If the user in the container is different from current user, we need
to change the owner of stdio before we enter the user namespace, or
else we may can't access stdio in the container.
Please see opencontainers#4475

Signed-off-by: lifubang <[email protected]>
@rata
Copy link
Member Author

rata commented Nov 1, 2024

@fuweid @kolyshkin @lifubang Thank you very much for investigating this! Sorry, I was sick and got hit harder than I expected.

I want to use this message to give you the debugging ninja #fixallthethings award! :-D

@fuweid is that output you pasted also retsnoop?

@fuweid
Copy link
Member

fuweid commented Nov 2, 2024

@fuweid is that output you pasted also retsnoop?

Yes. Based on the result, I realized that the stdio pipe is generated by shim and its owner is root.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants
@dims @rata @lifubang @cyphar @kolyshkin @fuweid and others