[remote] error creating cgroup [...]: dial unix /run/systemd/private: connect: EPERM #12362

edsantiago · 2021-11-19T02:37:38Z

New podman-remote flake. Two triggers, both in the command completion system test:

[+0573s] not ok 278 podman shell completion test
...
# # podman-remote --url unix:/tmp/podman_tmp_ykpP pod create --name created-5suGEc5kKDV9H0xfZLyMj77He0nHz9
# Error: failed to make pod: unable to create pod cgroup for pod 45b229948ea8dfc8b9fb6aa4c4e16e8eb3e03b6b7d0726054138d701b2d7a759: error creating cgroup machine.slice/machine-libpod_pod_45b229948ea8dfc8b9fb6aa4c4e16e8eb3e03b6b7d0726054138d701b2d7a759.slice: dial unix /run/systemd/private: connect: permission denied
# [ rc=125 (** EXPECTED 0 **) ]

Seen November 17 and 18 on different PRs, both times sys remote f34 root.

The text was updated successfully, but these errors were encountered:

edsantiago · 2021-11-20T22:25:49Z

This one is getting nasty. It took me three or four re-runs to get CI passing in my PR.

[sys] 278 podman shell completion test

fedora-34 : sys remote fedora-34 root host [remote]
- PR test: Update error string for --file-locks test #12367
  - [11-19 10:14](https://storage.googleapis.com/cirrus-ci-6707778565701632-fc
    ae48/artifacts/containers/podman/5961907518570496/html/sys-remote-fedora-34-root
    -host.log.html#t--00278)
- PR Semiperiodic cleanup of obsolete Skip()s #12349
  - [11-19 16:18](https://storage.googleapis.com/cirrus-ci-6707778565701632-fc
    ae48/artifacts/containers/podman/4920652189663232/html/sys-remote-fedora-34-root
    -host.log.html#t--00278)
  - [11-19 16:18](https://storage.googleapis.com/cirrus-ci-6707778565701632-fc
    ae48/artifacts/containers/podman/4920652189663232/html/sys-remote-fedora-34-root
    -host.log.html#t--00278)
  - [11-19 16:04](https://storage.googleapis.com/cirrus-ci-6707778565701632-fc
    ae48/artifacts/containers/podman/5616640801898496/html/sys-remote-fedora-34-root
    -host.log.html#t--00278)
  - [11-19 16:04](https://storage.googleapis.com/cirrus-ci-6707778565701632-fc
    ae48/artifacts/containers/podman/5616640801898496/html/sys-remote-fedora-34-root
    -host.log.html#t--00278)
  - [11-19 15:51](https://storage.googleapis.com/cirrus-ci-6707778565701632-fc
    ae48/artifacts/containers/podman/5305275268726784/html/sys-remote-fedora-34-root
    -host.log.html#t--00278)
  - [11-19 15:51](https://storage.googleapis.com/cirrus-ci-6707778565701632-fc
    ae48/artifacts/containers/podman/5305275268726784/html/sys-remote-fedora-34-root
    -host.log.html#t--00278)
  - [11-19 12:14](https://storage.googleapis.com/cirrus-ci-6707778565701632-fc
    ae48/artifacts/containers/podman/5977324706332672/html/sys-remote-fedora-34-root
    -host.log.html#t--00278)
  - [11-19 12:14](https://storage.googleapis.com/cirrus-ci-6707778565701632-fc
    ae48/artifacts/containers/podman/5977324706332672/html/sys-remote-fedora-34-root
    -host.log.html#t--00278)
  - [11-18 18:30](https://storage.googleapis.com/cirrus-ci-6707778565701632-fc
    ae48/artifacts/containers/podman/4870384060203008/html/sys-remote-fedora-34-root
    -host.log.html#t--00278)
fedora-35 : sys remote fedora-35 root host [remote]
- PR test: Update error string for --file-locks test #12367
  - [11-19 10:15](https://storage.googleapis.com/cirrus-ci-6707778565701632-fc
    ae48/artifacts/containers/podman/4836007611727872/html/sys-remote-fedora-35-root
    -host.log.html#t--00278)
- PR filter: use filepath.Match to maintain consistency with other pattern matching in podman #12364
  - [11-19 12:55](https://storage.googleapis.com/cirrus-ci-6707778565701632-fc
    ae48/artifacts/containers/podman/5335377687871488/html/sys-remote-fedora-35-root
    -host.log.html#t--00278)
- PR Semiperiodic cleanup of obsolete Skip()s #12349
  - [11-19 15:53](https://storage.googleapis.com/cirrus-ci-6707778565701632-fc
    ae48/artifacts/containers/podman/5012371484704768/html/sys-remote-fedora-35-root
    -host.log.html#t--00278)
  - [11-19 15:53](https://storage.googleapis.com/cirrus-ci-6707778565701632-fc
    ae48/artifacts/containers/podman/5012371484704768/html/sys-remote-fedora-35-root
    -host.log.html#t--00278)
  - [11-19 12:16](https://storage.googleapis.com/cirrus-ci-6707778565701632-fc
    ae48/artifacts/containers/podman/4851424799490048/html/sys-remote-fedora-35-root
    -host.log.html#t--00278)
  - [11-19 12:16](https://storage.googleapis.com/cirrus-ci-6707778565701632-fc
    ae48/artifacts/containers/podman/4851424799490048/html/sys-remote-fedora-35-root
    -host.log.html#t--00278)

edsantiago · 2021-11-20T22:27:41Z

And I think this is the same symptom, in unit tests. The EPERM is on /proc/self/fd/X instead of /run/systemd/private, but it still looks pretty similar to me:

Running: podman-remote [options] exec 3effc3275cba6b2a0c4415a0f389bb54f69bec9cda358d52cfbc26f4e3778297 cat /volume0/test.output
time="2021-11-20T08:35:38-06:00" level=error msg="error attaching to container 3effc3275cba6b2a0c4415a0f389bb54f69bec9cda358d52cfbc26f4e3778297 exec session f38e439288313359a83ab30e543e24a37ce6a0193e136b4aae4c82c1da43f027: dial unixpacket /proc/self/fd/17: connect: permission denied"
Error: dial unixpacket /proc/self/fd/17: connect: permission denied

rhatdan · 2021-11-22T16:15:42Z

@Luap99 PTAL

Luap99 · 2021-11-22T16:17:54Z

The failure has nothing to do with the shell completion test IMO. It fails at a simple pod create. I cannot help to debug this.

edsantiago · 2021-11-22T18:18:17Z

I cannot reproduce this on a 1minutetip VM. I tried this and let it run for 4 hours, no failures:

[build podman from sources @ main]
# bin/podman system service -t 0 &
[...]
# while :;do env PODMAN=$(pwd)/bin/podman-remote hack/bats completion || break;done

I'm seeing a weird EPERM in a completely unrelated test (Static Build):

...
copying path '/nix/store/3lqc09mn31yd05mcy6gcd9nsbcwwphic-cachix-0.6.1' from 'https://cache.nixos.org'...
building '/nix/store/q67vi5mmd5ir8k5xmnsbbqssc4prnmrb-user-environment.drv'...
created 25 symlinks in user environment
Configured https://podman.cachix.org binary cache in /etc/nix/nix.conf
unpacking 'https://github.com/nixos/nixpkgs/archive/2a96414d7e350160a33ed0978449c9ff5b5a6eb3.tar.gz'...
error: getting status of '/var/tmp/go/src/github.com/containers/podman/contrib/systemd/user': Permission denied

I've restarted that test, operating on the assumption that it's a flake. I don't know if it's related, and if it is, I can't imagine what it would mean.

edsantiago · 2021-11-22T21:54:55Z

In the last two days (as in: Saturday, Sunday, Monday up to now). This is a nasty one.

[sys] 278 podman shell completion test

fedora-34 : sys remote fedora-34 root host [remote]
fedora-35 : sys remote fedora-35 root host [remote]
- PR Rename pod on generate of container #12386
  - [11-22 14:35](https://storage.googleapis.com/cirrus-ci-6707778565701632-fc
    ae48/artifacts/containers/podman/5890716187492352/html/sys-remote-fedora-35-root-host.log.html#t--00278)
  - 11-22 14:01
- PR Bump github.com/containers/image/v5 from 5.16.1 to 5.17.0 #12384
  - 11-22 13:08
- PR compat: add layer caching compatiblity for non-podman clients. #12381
- PR Use hosts public ip address in rootless containers #12375
  - [11-20 20:05](https://storage.googleapis.com/cirrus-ci-6707778565701632-fcae48/artifacts/containers/podman/6745813339602944/html/sys-remote-fedora-35-root
    -host.log.html#t--00278)
  - [11-20 19:47](https://storage.googleapis.com/cirrus-ci-6707778565701632-fc
    ae48/artifacts/containers/podman/6077299398279168/html/sys-remote-fedora-35-root-host.log.html#t--00278)
  - 11-20 18:11
- PR Make sure netavark output is logged to the syslog #12372
  - 11-22 09:44
- PR Fix netavark error handling and teardown issue #12352
  - 11-22 13:01
- PR Restore runtime verification #12351
  - 11-20 11:08
- PR Support EXPOSE with port ranges #12305
  - 11-20 22:18
- PR Add note about volume with unprivileged container #12301
  - 11-21 22:56

Luap99 · 2021-11-23T11:46:49Z

@giuseppe Can you look at this, it looks like podman cannot create the pod crgoup via systemd.

rhatdan · 2021-11-23T12:58:00Z

Something is giving permission denied. The process running Podman is running as root and unconfined_t, and the socket is 777.
To me this indicates that this is not DAC (ownership permissions) or SELinux causing the issue. But perhaps systemd returning permission denied when podman tries to talk to the socket.

giuseppe · 2021-11-23T16:26:59Z

I see the following AVC in the Cirrus logs:

Nov 22 10:53:11 cirrus-task-5335687059734528 audit[4369]: AVC avc:  denied  { connectto } for  pid=4369 comm="podman" path="/run/systemd/private" scontext=system_u:system_r:container_t:s0:c331,c658 tcontext=system_u:system_r:init_t:s0 tclass=unix_stream_socket permissive=0

edsantiago · 2021-11-23T18:31:16Z

Oh, this is interesting. This time the failure happens near the end of the test, not (as usual) in the beginning:

# # podman-remote --url unix:/tmp/podman_tmp_5Gy1 image untag quay.io/libpod/testimage:20210610 ruxi3jhsy9llrmm4tgjgp3xnbnv9vb:3uBMd
# # podman-remote --url unix:/tmp/podman_tmp_5Gy1 pod rm -t 0 --force created-OH66XoYcPXs8h81WbhYzwyIl4Tm7zP
# 62e2fa62631c532b8472336c4e735dc81a711f10b0bfe6302ee7def215080952
# # podman-remote --url unix:/tmp/podman_tmp_5Gy1 pod rm -t 0 --force running-OH66XoYcPXs8h81WbhYzwyIl4Tm7zP
# Error: error removing pod a68d7a7bb5c7d09dff295a71d7ee98339db586c27d19b8bce0804fc0eb7aa205 cgroup: dial unix /run/systemd/private: connect: permission denied
# [ rc=125 (** EXPECTED 0 **) ]
# #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
# #| FAIL: exit code is 125; expected 0
# #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# # [teardown]
# # podman-remote --url unix:/tmp/podman_tmp_5Gy1 pod rm -t 0 --all --force
# Error: error removing pod 45a182b1c1ec5f1b4443c27387a375d506b133b6b940fa6484a45a0b0108ae9e cgroup: dial unix /run/systemd/private: connect: permission denied
# Error: error removing pod f657fb3dfe58a9a1db278893230903b18c299440d637886a2c9e2dd18763351b cgroup: dial unix /run/systemd/private: connect: permission denied
# [ rc=125 ]

What I find interesting:

One pod rm succeeds.
Immediately afterward, another pod rm fails.
teardown is invoked, and pod rm fails therein, which indicates that the EPERM is not transient

I know I shouldn't bash systemd, but I'm wondering, is it completely impossible that this is a systemd bug?

rhatdan · 2021-11-23T19:32:53Z

The AVC indicates that podman inside of a container is attempting to connect to systemd.

Nov 22 10:53:11 cirrus-task-5335687059734528 audit[4369]: AVC avc:  denied  { connectto } for  pid=4369 comm="podman" 
path="/run/systemd/private" scontext=system_u:system_r:container_t:s0:c331,c658 tcontext=system_u:system_r:init_t:s0 tclass=unix_stream_socket permissive=0

This is being blocked by SELinux and never getting to systemd. The question is why is podman running within a locked down container when this happens.

Luap99 · 2021-11-23T19:42:46Z

The common factor are remote tests, can the server change the label of itself instead of the container by accident?

rhatdan · 2021-11-23T20:15:43Z

Theoretically yes, but I do not see how, The likely candidate was right here:
https://github.com/containers/podman/blob/main/libpod/oci_conmon_linux.go#L1436-L1450
But this code should have changed the label to "s0", but the avc Giuseppe shows above is s0:c331,c658

rhatdan · 2021-11-23T20:20:36Z

selinux.SetTaskLabel would change the label of the service process, but no one calls that.
SetProcessLabel() is setting the label of executables exec'd by podman.
@edsantiago Any chance we know the PID of the service. If we new the server was running as pid=4369 then we would know that it is the process causing the issue. (Which is most likely the case).

rhatdan · 2021-11-23T20:25:32Z

There are actually more AVCs.

type=AVC msg=audit(1637692993.270:5093): avc:  denied  { connectto } for  pid=4376 comm="podman" path="/run/dbus/system_bus_socket" scontext=system_u:system_r:container_t:s0:c305,c326 tcontext=system_u:system_r:system_dbusd_t:s0-s0:c0.c1023 tclass=unix_stream_socket permissive=0
type=AVC msg=audit(1637692993.270:5094): avc:  denied  { connectto } for  pid=4376 comm="podman" path="/run/systemd/private" scontext=system_u:system_r:container_t:s0:c305,c326 tcontext=system_u:system_r:init_t:s0 tclass=unix_stream_socket permissive=0
type=AVC msg=audit(1637692993.332:5095): avc:  denied  { connectto } for  pid=4376 comm="podman" path="/run/dbus/system_bus_socket" scontext=system_u:system_r:container_t:s0:c305,c326 tcontext=system_u:system_r:system_dbusd_t:s0-s0:c0.c1023 tclass=unix_stream_socket permissive=0
type=AVC msg=audit(1637692993.332:5096): avc:  denied  { connectto } for  pid=4376 comm="podman" path="/run/systemd/private" scontext=system_u:system_r:container_t:s0:c305,c326 tcontext=system_u:system_r:init_t:s0 tclass=unix_stream_socket permissive=0

edsantiago · 2021-11-23T20:27:41Z

There are provisions for cat'ing the server log, but I don't see them triggered anywhere. If you're still playing with your reproducer PR, you could try:

diff --git a/Makefile b/Makefile
index ceecda274..8cef891a4 100644
--- a/Makefile
+++ b/Makefile
@@ -586,6 +586,7 @@ remotesystem:
                rc=$$?;\
                kill %1;\
                rm -f $$SOCK_FILE;\
+               echo "------------";echo "server log";cat $(PODMAN_SERVER_LOG);\
        else \
                echo "Skipping $@: 'timeout -v' unavailable'";\
        fi;\

rhatdan · 2021-11-23T20:48:16Z

I think the issue is here:
https://github.com/containers/podman/blob/main/libpod/oci_conmon_linux.go#L780-L782

rhatdan · 2021-11-23T20:58:50Z

We just turned on remote checkpoint testing, and the previous tests before the failing test is a checkpoint test.
I am no Hercule Poirot, but I think this is the issue.

This should fix the SELinux issue we are seeing with talking to /run/systemd/private. Fixes: containers#12362 Also unset the XDG_RUNTIME_DIR if set, since we don't know when running as a service if this will cause issue.s Signed-off-by: Daniel J Walsh <[email protected]>

edsantiago added flakes Flakes from Continuous Integration remote Problem is in podman-remote labels Nov 19, 2021

rhatdan mentioned this issue Nov 23, 2021

Unset SocketLabel after system finishes checkpointing #12388

Merged

openshift-merge-robot closed this as completed in #12388 Nov 24, 2021

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 21, 2023

github-actions bot locked as resolved and limited conversation to collaborators Sep 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[remote] error creating cgroup [...]: dial unix /run/systemd/private: connect: EPERM #12362

[remote] error creating cgroup [...]: dial unix /run/systemd/private: connect: EPERM #12362

edsantiago commented Nov 19, 2021

edsantiago commented Nov 20, 2021

edsantiago commented Nov 20, 2021

rhatdan commented Nov 22, 2021

Luap99 commented Nov 22, 2021

edsantiago commented Nov 22, 2021

edsantiago commented Nov 22, 2021

Luap99 commented Nov 23, 2021

rhatdan commented Nov 23, 2021

giuseppe commented Nov 23, 2021

edsantiago commented Nov 23, 2021

rhatdan commented Nov 23, 2021

Luap99 commented Nov 23, 2021

rhatdan commented Nov 23, 2021

rhatdan commented Nov 23, 2021

rhatdan commented Nov 23, 2021

edsantiago commented Nov 23, 2021

rhatdan commented Nov 23, 2021 •

edited

Loading

rhatdan commented Nov 23, 2021

[remote] error creating cgroup [...]: dial unix /run/systemd/private: connect: EPERM #12362

[remote] error creating cgroup [...]: dial unix /run/systemd/private: connect: EPERM #12362

Comments

edsantiago commented Nov 19, 2021

edsantiago commented Nov 20, 2021

[sys] 278 podman shell completion test

edsantiago commented Nov 20, 2021

rhatdan commented Nov 22, 2021

Luap99 commented Nov 22, 2021

edsantiago commented Nov 22, 2021

edsantiago commented Nov 22, 2021

[sys] 278 podman shell completion test

Luap99 commented Nov 23, 2021

rhatdan commented Nov 23, 2021

giuseppe commented Nov 23, 2021

edsantiago commented Nov 23, 2021

rhatdan commented Nov 23, 2021

Luap99 commented Nov 23, 2021

rhatdan commented Nov 23, 2021

rhatdan commented Nov 23, 2021

rhatdan commented Nov 23, 2021

edsantiago commented Nov 23, 2021

rhatdan commented Nov 23, 2021 • edited Loading

rhatdan commented Nov 23, 2021

rhatdan commented Nov 23, 2021 •

edited

Loading