-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Permission denied when container process executes close_range
syscall
#10337
Comments
Doesn't look like Seccomp. Our default profile lives at https://github.com/containers/common/blob/master/pkg/seccomp/seccomp.json#L80 and you can see that |
@mheon Do you have any other explanation for this behavior? The reason I brought up I've used strace in the real container: once with the flag and once without. With the flag, the strace log shows that the
Without the flag, we get the following:
(The numbers beside each syscall is the process id) |
Can you verify what profile is in use in the container you're running in? The default Podman profile does allow the syscall, so I have to assume your system may not be using the default |
The default profile should live at |
Please how do I do this? I did: podman create <image_hash>
podman inspect <container_name> The output:
|
I've also checked the installed profile (both seccomp.json
|
Your Seccomp profile does include |
You should see the denied seccomp call in /var/log/audit/audit.log ausearch -m seccomp -i |
I do:
Like I said, this only happens inside the container. On my host machine, the problem never occurs |
Something is going wrong then, some kind of mismatch between what the OCI Runtime understands is close_range and what the kernel does. I just wrote a quick patch to podman info to show what seccomp.json file the tool is using. |
Indeed I do ➜ grep -C4 'close_range' /usr/share/containers/seccomp.json
"clock_nanosleep",
"clock_nanosleep_time64",
"clone",
"close",
"close_range",
"connect",
"copy_file_range",
"creat",
"dup", |
Are you using runc or crun? @giuseppe ideas? |
I am using The same issue with |
Also when I switch to
|
close_range is used by crun. This is again the same issue with I think it is time we switch to use CC @kolyshkin |
Seeing same issue on F33, starting container with
|
@smac89 love your bug report, so easy to reproduce! |
we also need an updated libseccomp that knows about |
@giuseppe Did you open a PR with libseccomp to add this? |
I think at this point it is easier to fix it for good in our default seccomp profile now that runc rc95 is out and with the feature we need. Also libseccomp uses some scripts to read all the syscalls from the kernel sources, so it is not necessary to update it manually |
Ok what is our next steps then? Do we need a new PR to Podman? Containers-common? |
PR opened here: containers/common#573 |
A friendly reminder that this issue had no activity for 30 days. |
this is fixed in c/common |
I'm still facing this same issue with containers/common-0.40.1: $ pacman -Ss containers-common
community/containers-common 0.40.1-2 [installed]
Configuration files and manpages for containers $ podman run --rm -it a83749b0c3fdecb23737bcbc591262cbd8fc91f517b5d61106273d1965658320 /app/walk.c
/app/walk.c opened as FD 3
/proc/self/fd/0 ==> /dev/pts/0
/proc/self/fd/1 ==> /dev/pts/0
/proc/self/fd/2 ==> /dev/pts/0
/proc/self/fd/3 ==> /app/walk.c
/proc/self/fd/4 ==> /proc/1/fd
========= About to call close_range() =======
close_range: Operation not permitted |
EDIT: the problem is actually still here. |
I just triple checked, and I'm now in a very weird situation:
I'm now hitting another error ( |
Support for |
|
Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)
/kind bug
Description
I have an application which uses
close_range
syscall running inside a container. When I run the container, and the application makes that syscall, I get an error saying "Permission denied".At first I was thinking this was a problem with the application, but after some investigating, I am starting to think this may be a podman issue and may have something to do with how it handles seccomp profiles.
Steps to reproduce the issue:
walk.c
Copy the above script to /tmp on your host machine
Using
buildah
:7bd46f9814bb
with the id of the built image)Describe the results you received:
The result will look something like:
Describe the results you expected:
Now repeat this same process on your host linux machine (assuming you are running atleast kernel version 5.9)
The program should run successfully with an output similar to:
This is what I expected inside the container
Additional information you deem important (e.g. issue happens only occasionally):
If you run the image with the option
--security-opt seccomp=unconfined
, everything works fine.Does that mean
podman
is simply blocking theclose_range
syscall? Where does podman's default seccomp.json file live? I was under the impression that they use the default one from docker, which whitelistsclose_range
syscall.Output of
podman version
:Output of
podman info --debug
:Package info (e.g. output of
rpm -q podman
orapt list podman
):Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide?
Yes
Additional environment details (AWS, VirtualBox, physical, etc.):
The text was updated successfully, but these errors were encountered: