Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add close_range syscall to default seccomp profile #1262

Closed
thelamer opened this issue Jun 24, 2021 · 9 comments
Closed

add close_range syscall to default seccomp profile #1262

thelamer opened this issue Jun 24, 2021 · 9 comments

Comments

@thelamer
Copy link

Ran into this running latest vte3 (used for launching a bunch of graphical terminals outside of xterm and ultralights)
Some reference:
containers/podman#10337
mviereck/x11docker#346 (comment)
I don't know if this has been considered in the past or if there are any security implications of allowing this by default.
Let me know if you need anymore info.

@AkihiroSuda
Copy link

Already added in Docker 20.10.4 moby/moby#41971

@thelamer
Copy link
Author

Already added in Docker 20.10.4 moby/moby#41971

It seems like it is not whitelisted though. This syscall is blocked unless --security-opt seccomp=unconfined option is passed to the container.

@thelamer
Copy link
Author

@AkihiroSuda do you need a method of testing this locally?
I pushed up this image with the latest vte3 that makes the syscall: (just hop into 3000 and try to open a terminal)
Working:
docker run --rm -it -p 3000:3000 --security-opt seccomp=unconfined taisun/randomimages:close_range_test bash
Not working:
docker run --rm -it -p 3000:3000 taisun/randomimages:close_range_test bash

@thaJeztah
Copy link
Member

@thelamer what distro and kernel are you running on? (can you provide output of docker info and docker version?)

@thelamer
Copy link
Author

@thelamer what distro and kernel are you running on? (can you provide output of docker info and docker version?)
Focal amd64

docker-ce/focal,now 5:20.10.7~3-0~ubuntu-focal
libseccomp2/focal-updates,now 2.5.1-1ubuntu1~20.04.1

For Docker Info:
https://pastebin.com/zn0iZpV1

I guess the real question is can you replicate this using that test image? I have tested on multiple other setups also including some of our CI infrastructure all the same result. Also please reference the information in mviereck/x11docker#346 it has more than I have posted here, seems to affect all new distros with a current vte3 version.
I have not gone in and done any intense debugging, only taken the information in that thread and the resulting podman response at face value.
I could be wrong about this being the close_range syscall as I do not have the knowledge or setup to run syscall tests.

@thelamer
Copy link
Author

Looks like this has been fixed upstream by vte3, closing.

@thelamer
Copy link
Author

False alarm I re-tested improperly with --privileged. Kind of at a loss here how to proceed.
Does anyone on the team have a standard way of testing syscalls? Like are you compiling custom bins to ensure functionality?

@thelamer thelamer reopened this Sep 18, 2021
@thelamer
Copy link
Author

thelamer commented Sep 20, 2021

@thaJeztah
These are all the syscalls I got from strace for xfce4-terminal https://pastebin.com/hfq8c2Ti
No close_range as far as I can see. Comparing to your default list https://raw.githubusercontent.com/moby/moby/b05d0604ea89b31fde4ef23111b453a376aa5279/profiles/seccomp/default.json the only one that has potential to be blocked is clone, but it seems like my understanding is off on that as SYS_ADMIN cap does not alleviate the issue only unconfined seccomp.

I also started to delv into alt container tech and found that the default seccomp config with podman this just launches and works on the identical system and software setup, IE:

podman run --rm -it -p 3000:3000 linuxserver/webtop:fedora-xfce bash

I am sure there are many underlying differences here, but what I can't figure out is what syscall is being made here by Vte3 that works fine in podman with a stock config and not in Docker unless --security-opt seccomp=unconfined is passed.

So I dug deeper and started to pull everything from their default seccomp profile: (formatting looks identical)
https://github.com/containers/podman/blob/main/vendor/github.com/containers/common/pkg/seccomp/seccomp.json

The problem I ran into is many of the syscalls defined in their config spit back errors like:

docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: error adding seccomp filter rule for syscall bdflush: permission denied: unknown.
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: error adding seccomp filter rule for syscall io_pgetevents: permission denied: unknown.
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: error adding seccomp filter rule for syscall open_by_handle_at: permission denied: unknown.
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: error adding seccomp filter rule for syscall open_by_handle_at: permission denied: unknown.
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: error adding seccomp filter rule for syscall bpf: permission denied: unknown.
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: error adding seccomp filter rule for syscall bpf: permission denied: unknown.
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: error adding seccomp filter rule for syscall delete_module: permission denied: unknown.
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: error adding seccomp filter rule for syscall delete_module: permission denied: unknown.
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: error adding seccomp filter rule for syscall acct: permission denied: unknown.
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: error adding seccomp filter rule for syscall kcmp: permission denied: unknown.
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: error adding seccomp filter rule for syscall iopl: permission denied: unknown.
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: error adding seccomp filter rule for syscall iopl: permission denied: unknown.
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: error adding seccomp filter rule for syscall settimeofday: permission denied: unknown.
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: error adding seccomp filter rule for syscall settimeofday: permission denied: unknown.
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: error adding seccomp filter rule for syscall vhangup: permission denied: unknown.

After pulling out all the syscalls unknown to docker I spun up the container and hit the same roadblock.

So I am beginning to think there is a series of system calls in bleeding edge stuff that Docker might not be aware of or able to whitelist without turning it off completely to gain functionality.

In summary there is no versions of a seccomp profile that will ever work I suspect until new syscalls are supported by Docker, but I could be and often am wrong.
I would really appreciate your feedback here even if just pointing me in the right direction.

@ottobolyos
Copy link

ottobolyos commented Nov 6, 2024

I have a similar issue with realm/adcli (used for joining Active Directory realms) on ubuntu:24.04. Seemingly, I cannot get close_range and setsockopt enabled, as for these syscalls I get EPERM (Operation not permitted). However, as I understand it, both of these syscalls are enabled by default using the SCMP_ACT_ALLOW action, therefore, I don’t understand why it would not work.

These commands also require to use the socket syscall with non-40 value used as the first argument, thus I downloaded the default seccomp profile, and removed the args array for socket, thus effectively allow using socket without any restrictions. No other changes I made to the default seccomp profile.

Besides using the modified seccomp profile, I also added the following capabilities:

  • SYS_ADMIN which is required for the clone and clone3 syscalls;
  • SYS_CHROOT is required for the chroot syscall;
  • DAC_READ_SEARCH is required for the open_by_handle_at syscall.

However, I could not get it working unless I used --security-opt seccomp=unconfined or --privileged which I don’t want to use at any cost.

Specifically, I cannot allow the following sycalls:

1017  setsockopt(5, SOL_SOCKET, SO_SNDBUFFORCE, [8388608], 4) = -1 EPERM (Operation not permitted)
1029  setsockopt(5, SOL_SOCKET, SO_SNDBUFFORCE, [8388608], 4) = -1 EPERM (Operation not permitted)
1039  close_range(3, 4294967295, CLOSE_RANGE_CLOEXEC) = -1 EPERM (Operation not permitted)
1044  setsockopt(5, SOL_SOCKET, SO_SNDBUFFORCE, [8388608], 4) = -1 EPERM (Operation not permitted)

In the end, the issue is the installed version of libseccomp on the host where Docker is installed. In my case, I had Ubuntu 20.04.6 LTS and Docker 27.3.1 (build ce12230).

On Ubuntu 20.04.6 LTS (focal), the latest version of the libseccomp2 package is v2.5.1 which does not support close_range which was not supported by libseccomp before v2.5.2 (see this issue, this commit, and the comparison between v2.5.2 and v2.5.1). The latest libseccomp version is v2.5.5.

This isssue seems to be somewhat known, as @yosifkit write a nice write-up in docker-library/official-images#16829.

In my case, I decided to upgrade to Ubuntu 24.04.1 LTS, however, IMHO it would be better for Docker to provide the latest (tested/supported) libseccomp version in their APT repository (https://download.docker.com/linux/ubuntu), as well as other package repositories (like DNF/YUM).


Related (this is not a finite list):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants