Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Systemd in container hits critical errors when using private cgroup namespace on CentOS 8 #17727

Closed
yashagacisco opened this issue Mar 9, 2023 · 12 comments · Fixed by #17736
Closed
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@yashagacisco
Copy link

yashagacisco commented Mar 9, 2023

Issue Description

When you run a systemd container on a Centos 8 host, if you pass --cgroupns private, systemd fails to start in non-privileged mode. In privileged mode, it gives off some warnings.
This is probably because /sys/fs/cgroup/systemd does not get mounted properly. From inside the container, it appears empty.

> ls /sys/fs/cgroup/systemd/
[Empty folder]

The issue has been tested on Centos 8.2 and Centos 8.4.

This was tested on Ubuntu 18.04 and Ubuntu 20.04 as well. The issue was NOT present. So this looks Centos specific.

Workaround

If you run systemd in legacy mode and turn off podman's systemd mode, there is no error. i.e on passing -e SYSTEMD_PROC_CMDLINE=systemd.legacy_systemd_cgroup_controller=1 --systemd false along with --cgroupns private makes it run as expected.

Note: In non-privileged mode, systemd automatically runs in legacy cgroup mode, so that option isnt required.

Additional info which might help

On running the image specified above in non-privileged mode with entrypoint bash:
podman run --rm -it --name sys_ctr_non_priv_bash_entry --cgroupns private --systemd=always --entrypoint bash systemd
If you check the mounts, it is expected behaviour:

root@c26d6e772331:/#findmnt -R /sys/fs/cgroup/
TARGET                                     SOURCE         FSTYPE OPTIONS
/sys/fs/cgroup                             tmpfs          tmpfs  rw,nosuid,nodev,noexec,relatime,context="system_u:object_r:container_file_t:s0:c359,c708",mode=755
|-/sys/fs/cgroup/systemd                   systemd        cgroup ro,nosuid,nodev,noexec,relatime,seclabel,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd
| `-/sys/fs/cgroup/systemd                 cgroup[/../..] cgroup rw,nosuid,nodev,noexec,relatime,seclabel,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd
|   `-/sys/fs/cgroup/systemd/release_agent tmpfs[/null]   tmpfs  rw,nosuid,noexec,context="system_u:object_r:container_file_t:s0:c359,c708",size=65536k,mode=755
[...]

But after you run systemd and check the mounts:

root@c26d6e772331:/# exec /sbin/init
[Errors]
> podman exec c26d6e772331 findmnt -R /sys/fs/cgroup/
TARGET                            SOURCE  FSTYPE OPTIONS
/sys/fs/cgroup                    tmpfs   tmpfs  rw,nosuid,nodev,noexec,relatime,context="system_u:object_r:container_file_t:s0:c359,c708",mode=755
|-/sys/fs/cgroup/systemd          systemd cgroup ro,nosuid,nodev,noexec,relatime,seclabel,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd
|-/sys/fs/cgroup/cpu,cpuacct      cgroup  cgroup ro,nosuid,nodev,noexec,relatime,seclabel,cpu,cpuacct

Looks like somehow systemd unmounts the setup created by podman

Steps to reproduce the issue

Steps to reproduce the issue

  1. Create a simple Dockerfile to run a systemd container
FROM ubuntu:20.04
RUN apt-get update -y && apt-get install systemd -y && ln -s /lib/systemd/systemd /sbin/init
ENTRYPOINT ["/sbin/init"]
  1. Build the image - podman build . -t systemd
  2. Run container in private namespace - podman run --rm -it --name sys_ctr --cgroupns private systemd
  3. podman exec -it sys_ctr bash
  4. ls /sys/fs/cgroup/systemd/

Describe the results you received

There are unexpected errors:

systemd 245.4-4ubuntu3.20 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=hybrid)
Detected virtualization podman.
Detected architecture x86-64.

Welcome to Ubuntu 20.04.5 LTS!

Set hostname to <b65a7b9da11a>.
system-getty.slice: unit configures an IP firewall, but the local system does not support BPF/cgroup firewalling.
(This warning is only shown for the first unit using IP firewalling.)
system.slice: Failed to create cgroup /system.slice: No such file or directory
[  OK  ] Created slice system-getty.slice.
system.slice: Failed to create cgroup /system.slice: No such file or directory
[  OK  ] Created slice system-modprobe.slice.
user.slice: Failed to create cgroup /user.slice: No such file or directory
[  OK  ] Created slice User and Session Slice.
[  OK  ] Started Dispatch Password Requests to Console Directory Watch.
[  OK  ] Started Forward Password Requests to Wall Directory Watch.
[  OK  ] Reached target Local Encrypted Volumes.
[  OK  ] Reached target Paths.
[  OK  ] Reached target Remote File Systems.
[  OK  ] Reached target Slices.
[  OK  ] Reached target Swap.
[  OK  ] Listening on initctl Compatibility Named Pipe.
[  OK  ] Listening on Journal Socket (/dev/log).
[  OK  ] Listening on Journal Socket.
system.slice: Failed to create cgroup /system.slice: No such file or directory
         Starting Journal Service...
system.slice: Failed to create cgroup /system.slice: No such file or directory
systemd-journald.service: Failed to attach to cgroup /system.slice/systemd-journald.service: No such file or directory
         Starting Remount Root and Kernel File Systems...
systemd-journald.service: Main process exited, code=exited, status=219/CGROUP
systemd-journald.service: Failed with result 'exit-code'.
[FAILED] Failed to start Journal Service.
See 'systemctl status systemd-journald.service' for details.
[DEPEND] Dependency failed for Flush Journal to Persistent Storage.
systemd-journal-flush.service: Job systemd-journal-flush.service/start failed with result 'dependency'.
systemd-journald.service: Scheduled restart job, restart counter is at 1.
[  OK  ] Stopped Journal Service.
system.slice: Failed to create cgroup /system.slice: No such file or directory
systemd-remount-fs.service: Failed to attach to cgroup /system.slice/systemd-remount-fs.service: No such file or directory
         Starting Journal Service...
systemd-remount-fs.service: Main process exited, code=exited, status=219/CGROUP
systemd-remount-fs.service: Failed with result 'exit-code'.
[FAILED] Failed to start Remount Root and Kernel File Systems.
See 'systemctl status systemd-remount-fs.service' for details.
systemd-journald.service: Failed to attach to cgroup /system.slice/systemd-journald.service: No such file or directory
system.slice: Failed to create cgroup /system.slice: No such file or directory
         Starting Create System Users...
systemd-journald.service: Main process exited, code=exited, status=219/CGROUP
systemd-journald.service: Failed with result 'exit-code'.
[FAILED] Failed to start Journal Service.
See 'systemctl status systemd-journald.service' for details.
systemd-journald.service: Scheduled restart job, restart counter is at 2.
[  OK  ] Stopped Journal Service.
system.slice: Failed to create cgroup /system.slice: No such file or directory
systemd-sysusers.service: Failed to attach to cgroup /system.slice/systemd-sysusers.service: No such file or directory
         Starting Journal Service...
systemd-sysusers.service: Main process exited, code=exited, status=219/CGROUP
systemd-sysusers.service: Failed with result 'exit-code'.
[FAILED] Failed to start Create System Users.
See 'systemctl status systemd-sysusers.service' for details.
system.slice: Failed to create cgroup /system.slice: No such file or directory
systemd-journald.service: Failed to attach to cgroup /system.slice/systemd-journald.service: No such file or directory
         Starting Create Static Device Nodes in /dev...
systemd-journald.service: Main process exited, code=exited, status=219/CGROUP
systemd-journald.service: Failed with result 'exit-code'.
[FAILED] Failed to start Journal Service.
See 'systemctl status systemd-journald.service' for details.
systemd-tmpfiles-setup-dev.service: Failed to attach to cgroup /system.slice/systemd-tmpfiles-setup-dev.service: No such file or directory
systemd-journald.service: Scheduled restart job, restart counter is at 3.
[  OK  ] Stopped Journal Service.
system.slice: Failed to create cgroup /system.slice: No such file or directory
         Starting Journal Service...
systemd-tmpfiles-setup-dev.service: Main process exited, code=exited, status=219/CGROUP
systemd-tmpfiles-setup-dev.service: Failed with result 'exit-code'.
[FAILED] Failed to start Create Static Device Nodes in /dev.
See 'systemctl status systemd-tmpfiles-setup-dev.service' for details.
[  OK  ] Reached target Local File Systems (Pre).
[  OK  ] Reached target Local File Systems.
systemd-journald.service: Failed to attach to cgroup /system.slice/systemd-journald.service: No such file or directory
systemd-journald.service: Main process exited, code=exited, status=219/CGROUP
systemd-journald.service: Failed with result 'exit-code'.
[FAILED] Failed to start Journal Service.
See 'systemctl status systemd-journald.service' for details.
systemd-journald.service: Scheduled restart job, restart counter is at 4.
[  OK  ] Stopped Journal Service.
system.slice: Failed to create cgroup /system.slice: No such file or directory
         Starting Journal Service...
systemd-journald.service: Failed to attach to cgroup /system.slice/systemd-journald.service: No such file or directory
systemd-journald.service: Main process exited, code=exited, status=219/CGROUP
systemd-journald.service: Failed with result 'exit-code'.
[FAILED] Failed to start Journal Service.
See 'systemctl status systemd-journald.service' for details.
systemd-journald.service: Scheduled restart job, restart counter is at 5.
[  OK  ] Stopped Journal Service.
systemd-journald.service: Start request repeated too quickly.
systemd-journald.service: Failed with result 'exit-code'.
[FAILED] Failed to start Journal Service.
See 'systemctl status systemd-journald.service' for details.
systemd-journald.socket: Failed with result 'service-start-limit-hit'.
systemd-journald-dev-log.socket: Failed with result 'service-start-limit-hit'.
system.slice: Failed to create cgroup /system.slice: No such file or directory
         Starting Create Volatile Files and Directories...
systemd-tmpfiles-setup.service: Failed to attach to cgroup /system.slice/systemd-tmpfiles-setup.service: No such file or directory
systemd-tmpfiles-setup.service: Failed at step CGROUP spawning /usr/bin/systemd-tmpfiles: No such file or directory
systemd-tmpfiles-setup.service: Main process exited, code=exited, status=219/CGROUP
systemd-tmpfiles-setup.service: Failed with result 'exit-code'.
[FAILED] Failed to start Create Volatile Files and Directories.
See 'systemctl status systemd-tmpfiles-setup.service' for details.
system.slice: Failed to create cgroup /system.slice: No such file or directory
         Starting Network Name Resolution...
[  OK  ] Reached target System Time Set.
[  OK  ] Reached target System Time Synchronized.
system.slice: Failed to create cgroup /system.slice: No such file or directory
         Starting Update UTMP about System Boot/Shutdown...
systemd-resolved.service: Failed to attach to cgroup /system.slice/systemd-resolved.service: No such file or directory
systemd-resolved.service: Failed at step CGROUP spawning /lib/systemd/systemd-resolved: No such file or directory
systemd-update-utmp.service: Failed to attach to cgroup /system.slice/systemd-update-utmp.service: No such file or directory
systemd-update-utmp.service: Failed at step CGROUP spawning /lib/systemd/systemd-update-utmp: No such file or directory
systemd-resolved.service: Main process exited, code=exited, status=219/CGROUP
systemd-resolved.service: Failed with result 'exit-code'.
[FAILED] Failed to start Network Name Resolution.
See 'systemctl status systemd-resolved.service' for details.
systemd-update-utmp.service: Main process exited, code=exited, status=219/CGROUP
systemd-update-utmp.service: Failed with result 'exit-code'.
[FAILED] Failed to start Update UTMP about System Boot/Shutdown.
See 'systemctl status systemd-update-utmp.service' for details.
[DEPEND] Dependency failed for Update UTMP about System Runlevel Changes.
systemd-update-utmp-runlevel.service: Job systemd-update-utmp-runlevel.service/start failed with result 'dependency'.
systemd-resolved.service: Scheduled restart job, restart counter is at 1.
[  OK  ] Reached target System Initialization.
[  OK  ] Started Daily apt download activities.
[  OK  ] Started Daily apt upgrade and clean activities.
[  OK  ] Started Periodic ext4 Online Metadata Check for All Filesystems.
[  OK  ] Started Message of the Day.
[  OK  ] Started Daily Cleanup of Temporary Directories.
[  OK  ] Reached target Timers.
[  OK  ] Listening on D-Bus System Message Bus Socket.
[  OK  ] Reached target Sockets.
[  OK  ] Reached target Basic System.
system.slice: Failed to create cgroup /system.slice: No such file or directory
[  OK  dbus.service: Failed to attach to cgroup /system.slice/dbus.service: No such file or directory] Started D-Bus System Message Bus.

dbus.service: Failed at step CGROUP spawning /usr/bin/dbus-daemon: No such file or directory
system.slice: Failed to create cgroup /system.slice: No such file or directory
         Starting Dispatcher daemon for systemd-networkd...
system.slice: Failed to create cgroup /system.slice: No such file or directory
networkd-dispatcher.service: Failed to attach to cgroup /system.slice/networkd-dispatcher.service: No such file or directory
networkd-dispatcher.service: Failed at step CGROUP spawning /usr/bin/networkd-dispatcher: No such file or directory
         Starting Login Service...
[  OK  ] Stopped Network Name Resolution.
system.slice: Failed to create cgroup /system.slice: No such file or directory
systemd-logind.service: Failed to attach to cgroup /system.slice/systemd-logind.service: No such file or directory
systemd-logind.service: Failed at step CGROUP spawning /lib/systemd/systemd-logind: No such file or directory
         Starting Network Name Resolution...
system.slice: Failed to create cgroup /system.slice: No such file or directory
         Starting Permit User Sessions...
systemd-resolved.service: Failed to attach to cgroup /system.slice/systemd-resolved.service: No such file or directorydbus.service: Main process exited, code=exited, status=219/CGROUP

systemd-resolved.service: Failed at step CGROUP spawning /lib/systemd/systemd-resolved: No such file or directory
dbus.service: Failed with result 'exit-code'.
systemd-user-sessions.service: Failed to attach to cgroup /system.slice/systemd-user-sessions.service: No such file or directory
systemd-user-sessions.service: Failed at step CGROUP spawning /lib/systemd/systemd-user-sessions: No such file or directory
networkd-dispatcher.service: Main process exited, code=exited, status=219/CGROUP
networkd-dispatcher.service: Failed with result 'exit-code'.
[FAILED] Failed to start Dispatcher daemon for systemd-networkd.
See 'systemctl status networkd-dispatcher.service' for details.
systemd-logind.service: Main process exited, code=exited, status=219/CGROUP
systemd-logind.service: Failed with result 'exit-code'.
[FAILED] Failed to start Login Service.
See 'systemctl status systemd-logind.service' for details.
systemd-resolved.service: Main process exited, code=exited, status=219/CGROUP
systemd-resolved.service: Failed with result 'exit-code'.
[FAILED] Failed to start Network Name Resolution.
See 'systemctl status systemd-resolved.service' for details.
systemd-user-sessions.service: Main process exited, code=exited, status=219/CGROUP
systemd-user-sessions.service: Failed with result 'exit-code'.
[FAILED] Failed to start Permit User Sessions.
See 'systemctl status systemd-user-sessions.service' for details.
systemd-logind.service: Scheduled restart job, restart counter is at 1.
systemd-resolved.service: Scheduled restart job, restart counter is at 2.
system.slice: Failed to create cgroup /system.slice: No such file or directory
[  OK  ] Started Console Getty.
system.slice: Failed to create cgroup /system.slice: No such file or directory
[  OK  ] Started D-Bus System Message Bus.
dbus.service: Failed to attach to cgroup /system.slice/dbus.service: No such file or directory
dbus.service: Failed at step CGROUP spawning /usr/bin/dbus-daemon: No such file or directory
[  OK  ] Reached target Login Prompts.
[  OK  ] Stopped Login Service.
system.slice: Failed to create cgroup /system.slice: No such file or directory
         Starting Login Service...
[  OK  ] Stopped Network Name Resolution.
system.slice: Failed to create cgroup /system.slice: No such file or directory
systemd-logind.service: Failed to attach to cgroup /system.slice/systemd-logind.service: No such file or directory
systemd-logind.service: Failed at step CGROUP spawning /lib/systemd/systemd-logind: No such file or directory
         Starting Network Name Resolution...
dbus.service: Main process exited, code=exited, status=219/CGROUP
dbus.service: Failed with result 'exit-code'.
systemd-resolved.service: Failed to attach to cgroup /system.slice/systemd-resolved.service: No such file or directory
systemd-resolved.service: Failed at step CGROUP spawning /lib/systemd/systemd-resolved: No such file or directory
systemd-logind.service: Main process exited, code=exited, status=219/CGROUP
systemd-logind.service: Failed with result 'exit-code'.
[FAILED] Failed to start Login Service.
See 'systemctl status systemd-logind.service' for details.
systemd-resolved.service: Main process exited, code=exited, status=219/CGROUP
systemd-resolved.service: Failed with result 'exit-code'.
[FAILED] Failed to start Network Name Resolution.
See 'systemctl status systemd-resolved.service' for details.
systemd-logind.service: Scheduled restart job, restart counter is at 2.
systemd-resolved.service: Scheduled restart job, restart counter is at 3.
system.slice: Failed to create cgroup /system.slice: No such file or directory
[  OK  ] Started D-Bus System Message Bus.
dbus.service: Failed to attach to cgroup /system.slice/dbus.service: No such file or directory
dbus.service: Failed at step CGROUP spawning /usr/bin/dbus-daemon: No such file or directory
[  OK  ] Stopped Login Service.
system.slice: Failed to create cgroup /system.slice: No such file or directory
         Starting Login Service...
[  OK  ] Stopped Network Name Resolution.
system.slice: Failed to create cgroup /system.slice: No such file or directory
systemd-logind.service: Failed to attach to cgroup /system.slice/systemd-logind.service: No such file or directory
systemd-logind.service: Failed at step CGROUP spawning /lib/systemd/systemd-logind: No such file or directory
         Starting Network Name Resolution...
dbus.service: Main process exited, code=exited, status=219/CGROUP
dbus.service: Failed with result 'exit-code'.
systemd-resolved.service: Failed to attach to cgroup /system.slice/systemd-resolved.service: No such file or directory
systemd-resolved.service: Failed at step CGROUP spawning /lib/systemd/systemd-resolved: No such file or directory
systemd-logind.service: Main process exited, code=exited, status=219/CGROUP
systemd-logind.service: Failed with result 'exit-code'.
[FAILED] Failed to start Login Service.
See 'systemctl status systemd-logind.service' for details.
systemd-resolved.service: Main process exited, code=exited, status=219/CGROUP
systemd-resolved.service: Failed with result 'exit-code'.
[FAILED] Failed to start Network Name Resolution.
See 'systemctl status systemd-resolved.service' for details.
systemd-logind.service: Scheduled restart job, restart counter is at 3.
systemd-resolved.service: Scheduled restart job, restart counter is at 4.
system.slice: Failed to create cgroup /system.slice: No such file or directory
[  OK  ] Started D-Bus System Message Bus.
dbus.service: Failed to attach to cgroup /system.slice/dbus.service: No such file or directory
dbus.service: Failed at step CGROUP spawning /usr/bin/dbus-daemon: No such file or directory
[  OK  ] Stopped Login Service.
system.slice: Failed to create cgroup /system.slice: No such file or directory
         Starting Login Service...
[  OK  ] Stopped Network Name Resolution.
system.slice: Failed to create cgroup /system.slice: No such file or directory
         systemd-logind.service: Failed to attach to cgroup /system.slice/systemd-logind.service: No such file or directoryStarting Network Name Resolution...

systemd-logind.service: Failed at step CGROUP spawning /lib/systemd/systemd-logind: No such file or directory
dbus.service: Main process exited, code=exited, status=219/CGROUP
dbus.service: Failed with result 'exit-code'.
systemd-logind.service: Main process exited, code=exited, status=219/CGROUP
systemd-logind.service: Failed with result 'exit-code'.
[FAILED] Failed to start Login Service.
See 'systemctl status systemd-logind.service' for details.
systemd-resolved.service: Failed to attach to cgroup /system.slice/systemd-resolved.service: No such file or directorysystemd-logind.service: Scheduled restart job, restart counter is at 4.

system.slice: Failed to create cgroup /system.slice: No such file or directory
systemd-resolved.service: Failed at step CGROUP spawning /lib/systemd/systemd-resolved: No such file or directory
[  OK  ] Started D-Bus System Message Bus.
dbus.service: Failed to attach to cgroup /system.slice/dbus.service: No such file or directory
dbus.service: Failed at step CGROUP spawning /usr/bin/dbus-daemon: No such file or directory
[  OK  ] Stopped Login Service.
system.slice: Failed to create cgroup /system.slice: No such file or directory
         Starting Login Service...
systemd-resolved.service: Main process exited, code=exited, status=219/CGROUP
systemd-resolved.service: Failed with result 'exit-code'.
[FAILED] Failed to start Network Name Resolution.
See 'systemctl status systemd-resolved.service' for details.
dbus.service: Main process exited, code=exited, status=219/CGROUP
dbus.service: Failed with result 'exit-code'.
systemd-logind.service: Failed to attach to cgroup /system.slice/systemd-logind.service: No such file or directory
systemd-logind.service: Failed at step CGROUP spawning /lib/systemd/systemd-logind: No such file or directory
systemd-resolved.service: Scheduled restart job, restart counter is at 5.
dbus.service: Start request repeated too quickly.
dbus.service: Failed with result 'exit-code'.
[FAILED] Failed to start D-Bus System Message Bus.
See 'systemctl status dbus.service' for details.
dbus.socket: Failed with result 'service-start-limit-hit'.
[  OK  ] Stopped Network Name Resolution.
systemd-resolved.service: Start request repeated too quickly.
systemd-resolved.service: Failed with result 'exit-code'.
[FAILED] Failed to start Network Name Resolution.
See 'systemctl status systemd-resolved.service' for details.
[  OK  ] Reached target Host and Network Name Lookups.
systemd-logind.service: Main process exited, code=exited, status=219/CGROUP
systemd-logind.service: Failed with result 'exit-code'.
[FAILED] Failed to start Login Service.
See 'systemctl status systemd-logind.service' for details.
systemd-logind.service: Scheduled restart job, restart counter is at 5.
[  OK  ] Stopped Login Service.
[  OK  ] Listening on D-Bus System Message Bus Socket.
systemd-logind.service: Start request repeated too quickly.
systemd-logind.service: Failed with result 'exit-code'.
[FAILED] Failed to start Login Service.
See 'systemctl status systemd-logind.service' for details.
[  OK  ] Reached target Multi-User System.
[  OK  ] Reached target Graphical Interface.
Startup finished in 106ms.
console-getty.service: Failed to attach to cgroup /system.slice/console-getty.service: No such file or directory
console-getty.service: Failed at step CGROUP spawning /sbin/agetty: No such file or directory
console-getty.service: Succeeded.
console-getty.service: Scheduled restart job, restart counter is at 1.
system.slice: Failed to create cgroup /system.slice: No such file or directory
console-getty.service: Failed to attach to cgroup /system.slice/console-getty.service: No such file or directory
console-getty.service: Failed at step CGROUP spawning /sbin/agetty: No such file or directory
console-getty.service: Succeeded.
console-getty.service: Scheduled restart job, restart counter is at 2.
system.slice: Failed to create cgroup /system.slice: No such file or directory
console-getty.service: Failed to attach to cgroup /system.slice/console-getty.service: No such file or directory
console-getty.service: Failed at step CGROUP spawning /sbin/agetty: No such file or directory
console-getty.service: Succeeded.
console-getty.service: Scheduled restart job, restart counter is at 3.
system.slice: Failed to create cgroup /system.slice: No such file or directory
console-getty.service: Failed to attach to cgroup /system.slice/console-getty.service: No such file or directory
console-getty.service: Failed at step CGROUP spawning /sbin/agetty: No such file or directory
console-getty.service: Succeeded.
console-getty.service: Scheduled restart job, restart counter is at 4.
system.slice: Failed to create cgroup /system.slice: No such file or directory
console-getty.service: Failed to attach to cgroup /system.slice/console-getty.service: No such file or directory
console-getty.service: Failed at step CGROUP spawning /sbin/agetty: No such file or directory
console-getty.service: Succeeded.
console-getty.service: Scheduled restart job, restart counter is at 5.
console-getty.service: Start request repeated too quickly.
console-getty.service: Failed with result 'start-limit-hit'.

From inside the container, /sys/fs/cgroup/systemd/ appears empty.

> ls /sys/fs/cgroup/systemd/
[Empty folder]

Describe the results you expected

Systemd should be running with no warnings.
And systemd cgroup shouldn't be empty. I expected:

> ls /sys/fs/cgroup/systemd/
cgroup.clone_children  cgroup.procs  init.scope  notify_on_release  system.slice  tasks  user.slice

podman info output

host:
  arch: amd64
  buildahVersion: 1.28.0
  cgroupControllers:
  - cpuset
  - cpu
  - cpuacct
  - blkio
  - memory
  - devices
  - freezer
  - net_cls
  - perf_event
  - net_prio
  - hugetlb
  - pids
  - rdma
  cgroupManager: systemd
  cgroupVersion: v1
  conmon:
    package: conmon-2.0.29-1.module_el8.5.0+890+6b136101.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.29, commit: 84384406047fae626269133e1951c4b92eed7603'
  cpuUtilization:
    idlePercent: 94.7
    systemPercent: 1.6
    userPercent: 3.7
  cpus: 8
  distribution:
    distribution: '"centos"'
    version: "8"
  eventLogger: file
  hostname: localhost
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 4.18.0-305.3.1.el8.x86_64
  linkmode: dynamic
  logDriver: k8s-file
  memFree: 13589155840
  memTotal: 18714304512
  networkBackend: cni
  ociRuntime:
    name: runc
    package: runc-1.0.2-1.module_el8.5.0+911+f19012f9.x86_64
    path: /usr/bin/runc
    version: |-
      runc version 1.0.2
      spec: 1.0.2-dev
      go: go1.16.7
      libseccomp: 2.5.1
  os: linux
  remoteSocket:
    exists: true
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_NET_RAW,CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.1.8-1.module_el8.5.0+890+6b136101.x86_64
    version: |-
      slirp4netns version 1.1.8
      commit: d361001f495417b880f20329121e3aa431a8f90f
      libslirp: 4.4.0
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.1
  swapFree: 0
  swapTotal: 0
  uptime: 0h 4m 59.00s
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.access.redhat.com
  - registry.redhat.io
  - docker.io
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /var/lib/containers/storage
  graphRootAllocated: 53675536384
  graphRootUsed: 3219099648
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "true"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 0
  runRoot: /run/containers/storage
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 4.3.1
  Built: 1675202775
  BuiltTime: Tue Jan 31 22:06:15 2023
  GitCommit: ""
  GoVersion: go1.19.4
  Os: linux
  OsArch: linux/amd64
  Version: 4.3.1

Podman in a container

No

Privileged Or Rootless

Privileged

Upstream Latest Release

No

Additional environment details

Additional environment details

Additional information

Additional information like issue happens only occasionally or issue happens with a particular architecture or on a particular setting

@yashagacisco yashagacisco added the kind/bug Categorizes issue or PR as related to a bug. label Mar 9, 2023
@yashagacisco yashagacisco changed the title Running a systemd container using private cgroup namespace on a Centos 8 gives warnings Systemd container exits when using private cgroup namespace on CentOS 8 Mar 10, 2023
@yashagacisco yashagacisco changed the title Systemd container exits when using private cgroup namespace on CentOS 8 Systemd fails to start when using private cgroup namespace on CentOS 8 Mar 10, 2023
@yashagacisco yashagacisco changed the title Systemd fails to start when using private cgroup namespace on CentOS 8 Systemd in container hits critical errors when using private cgroup namespace on CentOS 8 Mar 10, 2023
@Luap99
Copy link
Member

Luap99 commented Mar 10, 2023

@giuseppe PTAL

@giuseppe
Copy link
Member

probably the issue is coming from bind mounting /sys/fs/cgroup/systemd from the host (not caring about the cgroupns mode) so that inside the container it is always pointing to the root.

Unfortunately, I don't see any way to fix it, Podman doesn't know in advance what is the destination cgroup so it cannot point to the final cgroup path, and there is no way to express it in the OCI runtime specs, unless we just make the entire cgroup directory rw (which is a security issue).

I think all we can do is add an error when this happens.

@giuseppe
Copy link
Member

opened a PR: #17736

@LewisGaul
Copy link

Unfortunately, I don't see any way to fix it, Podman doesn't know in advance what is the destination cgroup so it cannot point to the final cgroup path, and there is no way to express it in the OCI runtime specs, unless we just make the entire cgroup directory rw (which is a security issue).

Hey @giuseppe, I'm trying to understand this, why can't the systemd cgroup mount be created in the normal way as is done with --systemd=false? What do you mean by "Podman doesn't know in advance what is the destination cgroup"?

Is there definitely a security issue with making the mount rw when there's a private cgroup namespace involved?

@giuseppe
Copy link
Member

the other cgroups are created 'ro'. We want only the systemd named cgroup to be mounted 'rw'. In order to do that without a cgroupns we'd need to know in advance what the target cgroup is so we can bind mount only that; but podman doesn't know it because the cgroup is created by the OCI runtime.

@LewisGaul
Copy link

LewisGaul commented Mar 11, 2023

Ok I follow, but why can't the systemd cgroup mount be namespaced in the container and just mounted read-write? Would that require OCI changes too - is this job of making the systemd mount rw falling on Podman rather than the runtime that normally creates the mounts?

@giuseppe
Copy link
Member

Yes to specify that we will need some way to specify it in the specs, which is not currently doable.

@LewisGaul
Copy link

LewisGaul commented Mar 12, 2023

But if podman is currently modifying mounts that have been created by the runtime (e.g. creating a rw bind mount), why can't podman just create the private cgroupns systemd mount as a 'cgroup' mount rather than a bind mount?

Something like this (untested)?

--- a/libpod/container_internal_linux.go
+++ b/libpod/container_internal_linux.go
@@ -270,6 +271,18 @@ func (c *Container) setupSystemd(mounts []spec.Mount, g generate.Generator) erro
                        }
                }
                g.AddMount(systemdMnt)
+       } else if hasCgroupNs {  // cgroups v1 with cgroupns=private
+               if MountExists(mounts, "/sys/fs/cgroup/systemd") {
+                       g.RemoveMount("/sys/fs/cgroup/systemd")
+               }
+               systemdMnt = spec.Mount{
+                       Destination: "/sys/fs/cgroup/systemd",
+                       Type:        "cgroup",
+                       Source:      "cgroup",
+                       Options:     []string{"private", "rw", "none", "name=systemd"},
+               }
+               g.AddMount(systemdMnt)
+               g.AddLinuxMaskedPaths("/sys/fs/cgroup/systemd/release_agent")
        } else {
                mountOptions := []string{"bind", "rprivate"}
                skipMount := false

@giuseppe
Copy link
Member

the OCI runtime will mount the entire cgroup hierarchy not just the systemd controller. The cgroup file system is treated differently

@giuseppe
Copy link
Member

the OCI runtime will mount the entire cgroup hierarchy not just the systemd controller. The cgroup file system is treated differently

we can argue it is an issue in the OCI runtimes, but this is what currently happens; so all that Podman can do is to use a bind mount

@LewisGaul
Copy link

Ohh I thought the podman code was directly creating the individual cgroup/bind mounts, but actually it's just specifying what the runtime should create at a higher-level abstraction, which doesn't allow mounting an individual cgroup v1 mount... I see the problem, and this seems quite problematic in robustly supporting systemd.

@rhatdan
Copy link
Member

rhatdan commented Mar 13, 2023

At this point this seems more like a discussion then an issue.

@containers containers locked and limited conversation to collaborators Mar 13, 2023
@rhatdan rhatdan converted this issue into discussion #17760 Mar 13, 2023
giuseppe added a commit to giuseppe/libpod that referenced this issue Mar 14, 2023
On cgroup v1 we need to mount only the systemd named hierarchy as
writeable, so we configure the OCI runtime to mount /sys/fs/cgroup as
read-only and on top of that bind mount /sys/fs/cgroup/systemd.

But when we use a private cgroupns, we cannot do that since we don't
know the final cgroup path.

Also, do not override the mount if there is already one for
/sys/fs/cgroup/systemd.

Closes: containers#17727

Signed-off-by: Giuseppe Scrivano <[email protected]>

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants