Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

seccomp filter should return ENOSYS for unknown syscalls #2151

Closed
jethrogb opened this issue Oct 21, 2019 · 66 comments · Fixed by #2750
Closed

seccomp filter should return ENOSYS for unknown syscalls #2151

jethrogb opened this issue Oct 21, 2019 · 66 comments · Fixed by #2750
Assignees
Milestone

Comments

@jethrogb
Copy link

jethrogb commented Oct 21, 2019

Currently, the seccomp filter installed on Linux returns EPERM even for system calls that are unknown. This is problematic when new system calls are added by Linux. Programs wishing to use the new system call will try to call it, and will implement a fallback mechanism when ENOSYS is returned (indicating the kernel doesn't support the call). However, when using containers, it will likely receive EPERM instead, failing instead of trying the fallback path.

In addition to the list of acceptable syscalls, the container definition should include a maximum known syscall number. The seccomp filter should be configured such that calls above the maximum return ENOSYS. When new syscalls are added, the maximum can be increased after the seccomp policy is updated.

@eero-t
Copy link

eero-t commented Mar 24, 2020

The inherent fragility of seccomp(): https://lwn.net/Articles/738694/

The discussion on the article is very instructive. Basically whatever you do with seccomp, there are potential future landmines. And those article comments didn't even go into kernel syscall & libc differences between HW architectures.

@richfelker
Copy link

EPERM just should not be used here. For a very large number of syscalls, EPERM has a well-defined meaning the application/library code making the call knows how to react to, and "blocked by container policy" is usually not in the scope of that meaning. For specific syscalls where the container policy does match the specified EPERM failure case, EPERM could reasonably be kept, but the default for all blocked syscalls should be ENOSYS.

@richfelker
Copy link

Note that this is breaking running of musl 1.2.0+ binaries - see docker/for-win#8326 for another example - due to inability to perform the correct meaningful fallback. We can't just treat EPERM as cause for fallback too, since EPERM is also a meaningful error (and fallback to an old time32 syscall when seeing it would have TOCTOU bugs if the cause of EPERM went away between the two calls, making a race window where the Y2038-unsafe syscall gets used).

@cyphar
Copy link
Member

cyphar commented Sep 9, 2020

On the other hand, if we return ENOSYS for something like clone(CLONE_NEWUSER) then glibc may get very confused ("clone isn't supported? oh i guess we're on Linux 1.2..."). IMHO we should split the current policy into "pretend the syscall doesn't exist" and "tell the program it's not allowed to do dodgy things". I agree that EPERM for unknown syscalls isn't ideal.

@richfelker
Copy link

I don't think glibc actually does that, but if you're blocking specific functionality of a syscall rather than the whole syscall, the choice of error code should be made on a per-syscall basis to match existing error semantics. For example CLONE_NEWUSER should do whatever the kernel does when the kernel is configured without user namespace support or with it restricted to use by root. But as-yet-unknown syscalls should all error with ENOSYS.

@fweimer
Copy link

fweimer commented Nov 16, 2020

@cyphar For unrecognized flag arguments, most system calls use an EINVAL error. EPERM is not correct there, either. (I think clone is an outlier and used to ignore invalid flags in the kernel, but that's been fixed on the kernel side, unlike open, unless you count openat2.)

@fweimer
Copy link

fweimer commented Nov 16, 2020

Note that this issue is now somewhat urgent because Linux has added an faccessat2 system call, and many applications use it today, by calling glibc's faccessat function. glibc 2.32 and later implement faccessat using faccessat2, and perform fallback to the old emulation code only on ENOSYS errors.

@richfelker
Copy link

richfelker commented Nov 16, 2020

@fweimer Thanks for adding that. I really don't want this to remain a game of whack-a-mole where exceptions get added for each syscall once breakage is found. Upstream should do the right thing and stop producing EPERM entirely so fallbacks can work as intended. Having support from the glibc side in getting this handled right is much appreciated!

@cyphar
Copy link
Member

cyphar commented Nov 16, 2020

I want to point out that if you want to change the seccomp filter in Docker, you'll need to make an issue in the Docker repo. We don't control the seccomp filter that Docker uses. Until recently the runtime-spec didn't support custom return values, but that has changed so Docker will need to update their default seccomp profile.

As I said, returning ENOSYS for syscalls deliberately blocked is less than ideal so I'd suggest instead there should be a new seccomp rule added which encodes the highest-known syscall number and return ENOSYS if the requested syscall has a higher number. This isn't fool-proof (several architectures have gaps in their syscall tables for historical reasons) but with the new unified syscall number work it's incredibly unlikely this will cause significant issues in the future.

@jethrogb
Copy link
Author

jethrogb commented Nov 16, 2020

@cyphar First, runc needs to add the ability to specify two different "defaults": one for known but not specifically specified syscalls, and one for unknown syscalls. Currently, non-specified calls all return the same defaultAction.

@richfelker
Copy link

OK, can someone familiar with Docker and how that works open an issue on their side and link to this one?

@cyphar
Copy link
Member

cyphar commented Nov 16, 2020

@jethrogb I don't think runc is in a position to do that -- what set of syscalls are "known" is a property of the profile being written, not the container runtime. If you write a profile today and 50 syscalls get added next week, runc (or rather libseccomp) will know about those syscalls but the old profile will not.

Since Docker is the thing generating these profiles (and accepting user-specified profiles too), you would want the user-specified profile to say "this is the latest syscall at the time when I wrote this profile". The Docker-generated profile will then have a default action of SECCOMP_RET_ERRNO with errno EPERM (or whatever it is now) and then a rule will be added which checks whether the syscall number is larger than $NEWEST_SYSCALL and return ENOSYS in that case.

@jethrogb
Copy link
Author

I don't think runc is in a position to do that

Not today no, as you explain the profile description needs to be expanded to convey that information.

@thaJeztah
Copy link
Member

Currently, runc also needs to know about the syscalls if they're included in the profile (opencontainers/runtime-spec#1071), so if a profile specifies syscalls that runc doesn't know about, the container fails to start (see moby/moby#41562).

codonell added a commit to codonell/runtime-spec that referenced this issue Nov 17, 2020
On Linux the major C libraries expect that syscalls that are
blocked from running in the container runtime return ENOSYS
to allow fallbacks to be used. Returning EPERM by default is
not useful particularly for syscalls that would return EPERM
for actual access restrictions e.g. the new faccessat2.

The runtime-spec should set the standard and recommend ENOSYS
be returned just like a kernel would that doesn't support that
syscall. This allows C runtimes to fall back on other possible
implementations given the userspace policies.

Please see the upstream discussions:
https://lwn.net/Articles/738694/
- Discusses fragility of syscall filtering.
opencontainers/runc#2151
- glibc and musl request ENOSYS return for unknown syscalls.
systemd/systemd#16739
- Discusses systemd-nspawn breakage with faccessat2.
systemd/systemd#16819
- General policy for systemd-nspawn to return ENOSYS.
seccomp/libseccomp#286
- Block unknown syscalls and erturn ENOSYS.
codonell added a commit to codonell/runtime-spec that referenced this issue Nov 17, 2020
On Linux the major C libraries expect that syscalls that are
blocked from running in the container runtime return ENOSYS
to allow fallbacks to be used. Returning EPERM by default is
not useful particularly for syscalls that would return EPERM
for actual access restrictions e.g. the new faccessat2.

The runtime-spec should set the standard and recommend ENOSYS
be returned just like a kernel would that doesn't support that
syscall. This allows C runtimes to fall back on other possible
implementations given the userspace policies.

Please see the upstream discussions:
https://lwn.net/Articles/738694/
- Discusses fragility of syscall filtering.
opencontainers/runc#2151
- glibc and musl request ENOSYS return for unknown syscalls.
systemd/systemd#16739
- Discusses systemd-nspawn breakage with faccessat2.
systemd/systemd#16819
- General policy for systemd-nspawn to return ENOSYS.
seccomp/libseccomp#286
- Block unknown syscalls and return ENOSYS.
@cyphar
Copy link
Member

cyphar commented Nov 24, 2020

As a first-pass solution we can implement this in Docker et al by just assuming the largest syscall number specified in the profile at all is the last syscall before we give ENOSYS. This should work for most cases and we could extend the seccomp format in a separate PR.

@thaJeztah That is an issue but not really one that is super relevant here IMHO -- we just use libseccomp's syscall lookup features so updating libseccomp will update the supported syscall numbers. 🤷

@cyphar
Copy link
Member

cyphar commented Nov 25, 2020

Ah, I was wrong above -- I forgot that runtime-spec is entirely based around the syscall names so if we want to do this nicely (with a rule which just does a SCMP_CMP_GT on the syscall number) we can't do it without changing the spec. That kinda sucks.

Docker could still work around it (by making the default ENOSYS and then having an explicit rule for all syscalls not accepted by other filters to return EPERM) but that's quite ugly...

@thaJeztah
Copy link
Member

is entirely based around the syscall names

Yes. That's what I tried to refer to in #2151 (comment), but later saw my complete brain-fart that the issue I linked to was about capabilities, not syscalls 😂

@justincormack
Copy link
Contributor

I talked about some of these issues in my Kubecon talk last week. In particular just listing calls to block, not an allowlist makes more sense, even if it results on failing open for new syscalls. Could tweak the error codes more easily in this case.

@thaJeztah
Copy link
Member

In particular just listing calls to block, not an allowlist makes more sense, even if it results on failing open for new syscalls

Would there be a risk if new syscalls are added that were not known at the time that the seccomp profile was generated? I think this thread (or the one on the mailing list) mentions the option to have the profile include information about the highest syscall number that was known at the time the profile was generated (potentially allowing new syscalls to be treated with some default (ENOSYS ?)); of course assuming that "new syscalls" would have higher numbers.

@richfelker
Copy link

@justincormack I think failing-closed is very reasonable here; it's just that the error code is wrong. There's nothing wrong with a fail-closed mechanism that effectively just emulates an old kernel, which is what you get if you fail with ENOSYS.

@cyphar
Copy link
Member

cyphar commented Nov 26, 2020

It should remain an allow-list (fail-closed), the issue is that there are two kinds of failures that the current allow-list is handling with the same error code:

  1. Syscalls which we do not permit the container to run on purpose (such as open_by_handle_at). These must return -EPERM because -ENOSYS is simply the wrong error in that case. We explicitly are blocking these syscalls.
  2. Syscalls which are not included in the allow-list due to them not existing at the time the profile was created. -ENOSYS is the correct return value in this case.

The issue is that currently we pretend that all syscalls not included in the allow-list are in category (1) when in reality we should be defaulting to (2) for syscalls that were not known about at profile-creation time. In other words, the issue is not simply that "the error code is wrong" -- it's that there are two errors being handled with one error code. Changing the default action to return -ENOSYS would be just as incorrect at the current behaviour IMHO.

We could loosely infer which syscalls are in category (2) by assuming any syscall with a larger syscall number than the largest one in the profile is in category (2). However it would be nicer to have this behaviour be something that profile writers control (either by explicitly specifying the "largest known" syscall, or even better by allowing profiles to do SCMP_CMP_ operations on raw syscall numbers -- which would also solve the issue @thaJeztah mentioned).

@richfelker
Copy link

I wouldn't characterize it as "just as incorrect". Semantically ENOSYS makes no sense for a syscall you're blocking for access control reasons, but at worst the cause gets misreported and the application possibly tries klunky fallbacks that will also fail due to being blocked. It's far less incorrect than the current situation.

Of course I'd like to see this solved in a way that distinguishes the two cases, if this can be done in a way that works right by default and doesn't depend on the profile author understanding why EPERM is wrong for unknown syscalls. If the behavior remains broken the way it is now unless everyone fixes their profiles, this is a major problem far worse than just changing the default could create.

@cyphar
Copy link
Member

cyphar commented Nov 26, 2020

I guess that's a fair point. We could switch Docker to use ENOSYS as the default return value (to stop the bleeding) while we work on being able to nicely differentiate the two?

@fweimer
Copy link

fweimer commented Nov 26, 2020

2\. Syscalls which are not included in the allow-list due to them not existing at the time the profile was created.

@cyphar This rule worries me because it could mean that as soon as the profile is re-created with knowledge of the faccessat2 system call (for example), the error code would turn from ENOSYS to EPERM because the system call transitions from unknown to implicitly filtered. I think we need a mechanism where simple rebuilds result in reproducible filters.

@jethrogb
Copy link
Author

I'm not entirely sure, but I don't believe the seccomp profile is included in the Docker container image? This is just a runtime thing?

@fweimer
Copy link

fweimer commented Nov 26, 2020

I actually meant rebuilding the runtime against newer kernel headers/libseccomp. It should not have this effect, either.

@kallisti5
Copy link

What is the actual fix for this one? I think the massive number of references above show this is breaking a LOT of stuff in a lot of places. A lot of the bugs above mention using older os images as the "workaround", but nobody has found a solution.

In the concourse bug I opened above, 32-bit binaries are failing to run in runc containers. statx is allowed via the seccomp config, but is failing with EPERM for an unknown reason.

I opened opencontainers/runtime-spec#1122 to "change the default errnoRet", but even that seems like an incorrect solution.

Does anyone have an idea on what is going on?

@cyphar
Copy link
Member

cyphar commented Sep 1, 2021

In the concourse bug I opened above, 32-bit binaries are failing to run in runc containers. statx is allowed via the seccomp config, but is failing with EPERM for an unknown reason.

That sounds like a bug in our BPF patching code (that really shouldn't be happening -- we do have handling for different architectures including the 32-on-64-bit "architecture" so it's a bit puzzling that it's not working as expected). I will take a look at this this week.

@kallisti5
Copy link

@cyphar Thanks :-) Would moving away from Fedora 34 to something like Alpine be a temporary workaround? We really didn't run into this until we upgraded the container host from Fedora 33 to 34.

@cyphar
Copy link
Member

cyphar commented Sep 1, 2021

I'm not sure to be honest -- I'm also confused how the container host upgrade could've caused this as well. If there's an issue with our 32-bit compat handling it should happen on every system. The obvious contender (kernel version change) doesn't really explain it either.

@kallisti5
Copy link

@cyphar @fweimer-rh keeps mentioning something changed in Fedora 34, but really hasn't identified what beyond this:

As I said on the Fedora bug, it's a bug in the container environment. Fedora 34 simply uses slightly different parts of the Linux system call interface than previous versions.

@fweimer , @fweimer-rh , can you provide clarification on what changed in Fedora 34?

@fweimer-rh
Copy link

On the host? libseccomp might have learned about additional system calls, therefore changing the point of the ENOSYS boundary.

@cyphar
Copy link
Member

cyphar commented Sep 1, 2021

The ENOSYS boundary is primarily set by the seccomp profile not libseccomp (if libseccomp doesn't know about a syscall it's ignored, but it's still primarily set by the profile -- which wouldn't have changed between versions since it's a Docker-version thing not related to the rest of the host and statx in particular has been allowed for quite a while). But yeah, as I said I'll take a look at this this wee.

stanislavlevin added a commit to stanislavlevin/freeipa that referenced this issue Oct 15, 2021
This allows application to detect whether the kernel supports
syscall or not. Previously, an error was unconditionally EPERM.
There are many issues about glibc failed with new syscalls in containerized
environments if their host run on old kernel.

More about motivation for ENOSYS over EPERM:
opencontainers/runc#2151
opencontainers/runc#2750

See about defaultErrnoRet introduction:
opencontainers/runtime-spec#1087

Previously, FreeIPA profile was vendored from
https://github.com/containers/podman/blob/main/vendor/github.com/containers/common/pkg/seccomp/seccomp.json

Now it is merged directly from
https://github.com/containers/common/blob/main/pkg/seccomp/seccomp.json

Fixes: https://pagure.io/freeipa/issue/9008
Signed-off-by: Stanislav Levin <[email protected]>
abbra pushed a commit to freeipa/freeipa that referenced this issue Oct 18, 2021
This allows application to detect whether the kernel supports
syscall or not. Previously, an error was unconditionally EPERM.
There are many issues about glibc failed with new syscalls in containerized
environments if their host run on old kernel.

More about motivation for ENOSYS over EPERM:
opencontainers/runc#2151
opencontainers/runc#2750

See about defaultErrnoRet introduction:
opencontainers/runtime-spec#1087

Previously, FreeIPA profile was vendored from
https://github.com/containers/podman/blob/main/vendor/github.com/containers/common/pkg/seccomp/seccomp.json

Now it is merged directly from
https://github.com/containers/common/blob/main/pkg/seccomp/seccomp.json

Fixes: https://pagure.io/freeipa/issue/9008
Signed-off-by: Stanislav Levin <[email protected]>
Reviewed-By: Alexander Bokovoy <[email protected]>
flo-renaud pushed a commit to flo-renaud/freeipa that referenced this issue Oct 18, 2021
This allows application to detect whether the kernel supports
syscall or not. Previously, an error was unconditionally EPERM.
There are many issues about glibc failed with new syscalls in containerized
environments if their host run on old kernel.

More about motivation for ENOSYS over EPERM:
opencontainers/runc#2151
opencontainers/runc#2750

See about defaultErrnoRet introduction:
opencontainers/runtime-spec#1087

Previously, FreeIPA profile was vendored from
https://github.com/containers/podman/blob/main/vendor/github.com/containers/common/pkg/seccomp/seccomp.json

Now it is merged directly from
https://github.com/containers/common/blob/main/pkg/seccomp/seccomp.json

Fixes: https://pagure.io/freeipa/issue/9008
Signed-off-by: Stanislav Levin <[email protected]>
flo-renaud pushed a commit to flo-renaud/freeipa that referenced this issue Oct 18, 2021
This allows application to detect whether the kernel supports
syscall or not. Previously, an error was unconditionally EPERM.
There are many issues about glibc failed with new syscalls in containerized
environments if their host run on old kernel.

More about motivation for ENOSYS over EPERM:
opencontainers/runc#2151
opencontainers/runc#2750

See about defaultErrnoRet introduction:
opencontainers/runtime-spec#1087

Previously, FreeIPA profile was vendored from
https://github.com/containers/podman/blob/main/vendor/github.com/containers/common/pkg/seccomp/seccomp.json

Now it is merged directly from
https://github.com/containers/common/blob/main/pkg/seccomp/seccomp.json

Fixes: https://pagure.io/freeipa/issue/9008
Signed-off-by: Stanislav Levin <[email protected]>
stanislavlevin added a commit to stanislavlevin/freeipa that referenced this issue Oct 19, 2021
This allows application to detect whether the kernel supports
syscall or not. Previously, an error unconditionally was EPERM.
There are many issues about glibc failed to new syscalls in containerized
environments for which host run on old kernel.

More about motivation for ENOSYS over EPERM:
opencontainers/runc#2151
opencontainers/runc#2750

See about defaultErrnoRet introduction:
opencontainers/runtime-spec#1087

Previously, FreeIPA profile was vendored from
https://github.com/containers/podman/blob/main/vendor/github.com/containers/common/pkg/seccomp/seccomp.json

Now it is merged directly from
https://github.com/containers/common/blob/main/pkg/seccomp/seccomp.json
abbra pushed a commit to freeipa/freeipa that referenced this issue Oct 19, 2021
This allows application to detect whether the kernel supports
syscall or not. Previously, an error was unconditionally EPERM.
There are many issues about glibc failed with new syscalls in containerized
environments if their host run on old kernel.

More about motivation for ENOSYS over EPERM:
opencontainers/runc#2151
opencontainers/runc#2750

See about defaultErrnoRet introduction:
opencontainers/runtime-spec#1087

Previously, FreeIPA profile was vendored from
https://github.com/containers/podman/blob/main/vendor/github.com/containers/common/pkg/seccomp/seccomp.json

Now it is merged directly from
https://github.com/containers/common/blob/main/pkg/seccomp/seccomp.json

Fixes: https://pagure.io/freeipa/issue/9008
Signed-off-by: Stanislav Levin <[email protected]>
Reviewed-By: Alexander Bokovoy <[email protected]>
stanislavlevin added a commit to stanislavlevin/freeipa that referenced this issue Oct 21, 2021
This allows application to detect whether the kernel supports
syscall or not. Previously, an error unconditionally was EPERM.
There are many issues about glibc failed to new syscalls in containerized
environments for which host run on old kernel.

More about motivation for ENOSYS over EPERM:
opencontainers/runc#2151
opencontainers/runc#2750

See about defaultErrnoRet introduction:
opencontainers/runtime-spec#1087

Previously, FreeIPA profile was vendored from
https://github.com/containers/podman/blob/main/vendor/github.com/containers/common/pkg/seccomp/seccomp.json

Now it is merged directly from
https://github.com/containers/common/blob/main/pkg/seccomp/seccomp.json
rcritten pushed a commit to flo-renaud/freeipa that referenced this issue Oct 21, 2021
This allows application to detect whether the kernel supports
syscall or not. Previously, an error was unconditionally EPERM.
There are many issues about glibc failed with new syscalls in containerized
environments if their host run on old kernel.

More about motivation for ENOSYS over EPERM:
opencontainers/runc#2151
opencontainers/runc#2750

See about defaultErrnoRet introduction:
opencontainers/runtime-spec#1087

Previously, FreeIPA profile was vendored from
https://github.com/containers/podman/blob/main/vendor/github.com/containers/common/pkg/seccomp/seccomp.json

Now it is merged directly from
https://github.com/containers/common/blob/main/pkg/seccomp/seccomp.json

Fixes: https://pagure.io/freeipa/issue/9008
Signed-off-by: Stanislav Levin <[email protected]>
abbra pushed a commit to freeipa/freeipa that referenced this issue Oct 23, 2021
This allows application to detect whether the kernel supports
syscall or not. Previously, an error was unconditionally EPERM.
There are many issues about glibc failed with new syscalls in containerized
environments if their host run on old kernel.

More about motivation for ENOSYS over EPERM:
opencontainers/runc#2151
opencontainers/runc#2750

See about defaultErrnoRet introduction:
opencontainers/runtime-spec#1087

Previously, FreeIPA profile was vendored from
https://github.com/containers/podman/blob/main/vendor/github.com/containers/common/pkg/seccomp/seccomp.json

Now it is merged directly from
https://github.com/containers/common/blob/main/pkg/seccomp/seccomp.json

Fixes: https://pagure.io/freeipa/issue/9008
Signed-off-by: Stanislav Levin <[email protected]>
Reviewed-By: Alexander Bokovoy <[email protected]>
stanislavlevin added a commit to stanislavlevin/freeipa that referenced this issue Oct 25, 2021
This allows application to detect whether the kernel supports
syscall or not. Previously, an error unconditionally was EPERM.
There are many issues about glibc failed to new syscalls in containerized
environments for which host run on old kernel.

More about motivation for ENOSYS over EPERM:
opencontainers/runc#2151
opencontainers/runc#2750

See about defaultErrnoRet introduction:
opencontainers/runtime-spec#1087

Previously, FreeIPA profile was vendored from
https://github.com/containers/podman/blob/main/vendor/github.com/containers/common/pkg/seccomp/seccomp.json

Now it is merged directly from
https://github.com/containers/common/blob/main/pkg/seccomp/seccomp.json
jlebon added a commit to cgwalters/rpm-ostree that referenced this issue Nov 18, 2022
The glib2 shipping in Fedora 37 is hitting the classic seccomp EPERM vs
ENOSYS issue for `close_range` when used via `createrepo_c`.
Interestingly, Fedora 36 carried a patch for this:

https://src.fedoraproject.org/rpms/glib2/c/a2259ad90593383c5ce982fbb233fd3658c0a7a1?branch=f36

But this patch is not carried in Fedora 37, presumably on the basis that
by then hosts should be running a new enough runc to fix

opencontainers/runc#2151

But clearly, that hasn't happened yet for whatever version runc that
moby-engine uses in `ubuntu-latest`.

Hack around this by running the container in privileged mode.
jlebon added a commit to cgwalters/rpm-ostree that referenced this issue Nov 18, 2022
The glib2 shipping in Fedora 37 is hitting the classic seccomp EPERM vs
ENOSYS issue for `close_range` when used via `createrepo_c`.
Interestingly, Fedora 36 carried a patch for this:

https://src.fedoraproject.org/rpms/glib2/c/a2259ad90593383c5ce982fbb233fd3658c0a7a1?branch=f36

But this patch is not carried in Fedora 37, presumably on the basis that
by then hosts should be running a new enough runc to fix

opencontainers/runc#2151

But clearly, that hasn't happened yet for whatever version runc that
moby-engine uses in `ubuntu-latest`.

Hack around this by running the container in privileged mode.
stanislavlevin added a commit to stanislavlevin/freeipa that referenced this issue Sep 9, 2024
This allows application to detect whether the kernel supports
syscall or not. Previously, an error was unconditionally EPERM.
There are many issues about glibc failed with new syscalls in containerized
environments if their host run on old kernel.

More about motivation for ENOSYS over EPERM:
opencontainers/runc#2151
opencontainers/runc#2750

See about defaultErrnoRet introduction:
opencontainers/runtime-spec#1087

Previously, FreeIPA profile was vendored from
https://github.com/containers/podman/blob/main/vendor/github.com/containers/common/pkg/seccomp/seccomp.json

Now it is merged directly from
https://github.com/containers/common/blob/main/pkg/seccomp/seccomp.json

Fixes: https://pagure.io/freeipa/issue/9008
Signed-off-by: Stanislav Levin <[email protected]>
Reviewed-By: Alexander Bokovoy <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet