Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podman does not detect systemd entrypoints when prefixed with /bin/sh #13324

Closed
dcermak opened this issue Feb 23, 2022 · 15 comments · Fixed by #13619
Closed

podman does not detect systemd entrypoints when prefixed with /bin/sh #13324

dcermak opened this issue Feb 23, 2022 · 15 comments · Fixed by #13619
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@dcermak
Copy link
Contributor

dcermak commented Feb 23, 2022

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

/kind feature

Description

Podman automatically detects whether a container is launching systemd as it's CMD or ENTRYPOINT. Unfortunately this logic only works when systemd is added as CMD ["/usr/lib/systemd/systemd"]. If you add it via CMD /usr/lib/systemd/systemd then podman will fail to launch it without the --systemd=always flag.

Steps to reproduce the issue:

  1. Create the following Dockerfile:
FROM registry.fedoraproject.org/fedora:latest
RUN dnf -y install systemd
CMD ["/usr/lib/systemd/systemd"]
  1. build it via buildah bud --layers . and launch the container via podman run --rm -it $HASH, which should work

  2. Change the Dockerfile to:

FROM registry.fedoraproject.org/fedora:latest
RUN dnf -y install systemd
CMD /usr/lib/systemd/systemd
  1. Rebuild via buildah bud --layers . and launch the container via podman run --rm -it $HASH which will fail with:
🕙[ 16:43:33 ] ❯ podman run --rm -it $HASH
Failed to mount tmpfs at /run: Operation not permitted
[!!!!!!] Failed to mount API filesystems.
Exiting PID 1...

but if you add the --systemd=always flag, then the container works.

The issue here is that in the 2nd Dockerfile results in the following command that can be found via `podman inspect:

        "Config": {
            "Cmd": [
                "/bin/sh",
                "-c",
                "/usr/lib/systemd/systemd"
            ]
         }

whereas the first one results in:

        "Config": {
            "Cmd": [
                "/usr/lib/systemd/systemd"
            ]
         }

But podman only recognizes the latter as a systemd container.

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

podman version 4.0.0-dev

Output of podman info --debug:

host:
  arch: amd64
  buildahVersion: 1.24.2
  cgroupControllers:
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.0-2.fc35.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.0, commit: '
  cpus: 12
  distribution:
    distribution: fedora
    version: "35"
  eventLogger: journald
  hostname: Boreas
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 10000
      size: 65536
    - container_id: 65537
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 10000
      size: 65536
    - container_id: 65537
      host_id: 100000
      size: 65536
  kernel: 5.16.9-200.fc35.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 2967023616
  memTotal: 33319424000
  networkBackend: cni
  ociRuntime:
    name: crun
    package: crun-1.4.2-1.fc35.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.4.2
      commit: f6fbc8f840df1a414f31a60953ae514fa497c748
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  remoteSocket:
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.1.12-2.fc35.x86_64
    version: |-
      slirp4netns version 1.1.12
      commit: 7a104a101aa3278a2152351a082a6df71f57c9a3
      libslirp: 4.6.1
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.3
  swapFree: 8566222848
  swapTotal: 8589930496
  uptime: 65h 9m 48.74s (Approximately 2.71 days)
plugins:
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /home/dan/.config/containers/storage.conf
  containerStore:
    number: 1
    paused: 0
    running: 1
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: fuse-overlayfs-1.7.1-2.fc35.x86_64
      Version: |-
        fusermount3 version: 3.10.5
        fuse-overlayfs: version 1.7.1
        FUSE library version 3.10.5
        using FUSE kernel interface version 7.31
  graphRoot: /home/dan/.local/share/containers/storage
  graphStatus:
    Backing Filesystem: btrfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 1283
  runRoot: /run/user/1000/containers
  volumePath: /home/dan/.local/share/containers/storage/volumes
version:
  APIVersion: 4.0.0-dev
  Built: 1645632321
  BuiltTime: Wed Feb 23 17:05:21 2022
  GitCommit: d3699bbce63f283a609053d4aca23e4abe7dae4d
  GoVersion: go1.18beta1
  OsArch: linux/amd64
  Version: 4.0.0-dev

Package info (e.g. output of rpm -q podman or apt list podman):

build from main and podman-3.4.4-1.fc35.x86_64

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/main/troubleshooting.md)

Yes

@openshift-ci openshift-ci bot added kind/bug Categorizes issue or PR as related to a bug. kind/feature Categorizes issue or PR as related to a new feature. labels Feb 23, 2022
@mheon mheon removed the kind/feature Categorizes issue or PR as related to a new feature. label Feb 23, 2022
@vrothberg
Copy link
Member

vrothberg commented Feb 24, 2022

      "Config": {
            "Cmd": [
                "/bin/sh",
                "-c",
                "/usr/lib/systemd/systemd"
            ]
         }

That (i.e., sh -c 'systemd) implies that systemd is not PID 1 inside the container (since sh is PID 1), so systemd would not run correctly. I think Podman behaves correctly although the Dockerfile syntax is quite confusing.

@dcermak
Copy link
Contributor Author

dcermak commented Feb 24, 2022

@vrothberg the container works though and I haven't observed systemd actually complaining

@vrothberg
Copy link
Member

Indeed systemd is PID 1 in this case. There must be some knowledge gap on my end. I'll investigate a bit further.

@vrothberg
Copy link
Member

Indeed systemd is PID 1 in this case. There must be some knowledge gap on my end. I'll investigate a bit further.

It seems this is an optimization by some shells with the last command not being fork'd but exec'd. I do not know if sh behaves like that consistently.

@giuseppe
Copy link
Member

you could override the --entrypoint or just force --systemd=always. I don't think we should try to detect the sh -c /usr/lib/systemd/systemd case

@vrothberg
Copy link
Member

I concur. We cannot/should not rely on shells to behave like that consistently.

@dcermak
Copy link
Contributor Author

dcermak commented Feb 25, 2022

you could override the --entrypoint or just force --systemd=always. I don't think we should try to detect the sh -c /usr/lib/systemd/systemd case

Certainly, I stated that in the bugreport. But I would still consider it a great quality of life improvement if podman would do the right thing in this case, as I guess that most shells will behave so that systemd would still work.

@rhatdan
Copy link
Member

rhatdan commented Feb 25, 2022

I can go back to the original version of this patch which looked specifically for ["/bin/sh", "-c", "PATHTO/systemd"] since this is almost certainly the case where podman added the /bin/sh -c.

sh -c "command" I would think always execs the command within the same PID as the shell.

@dcermak
Copy link
Contributor Author

dcermak commented Mar 7, 2022

I can go back to the original version of this patch which looked specifically for ["/bin/sh", "-c", "PATHTO/systemd"] since this is almost certainly the case where podman added the /bin/sh -c.

I would definitely be in favor of that.

sh -c "command" I would think always execs the command within the same PID as the shell.

Unfortunately this is not guaranteed by the POSIX standard. Most shells do that, but we can't really rely on that.

@dcermak
Copy link
Contributor Author

dcermak commented Mar 29, 2022

Thanks for fixing this @rhatdan!

mheon pushed a commit to mheon/libpod that referenced this issue Mar 30, 2022
dcermak added a commit to SUSE/BCI-dockerfile-generator that referenced this issue Jun 10, 2022
@tacerus
Copy link

tacerus commented Nov 21, 2022

Hello,

thank you all for the issue report and the fix.
If I understand the patch correctly, the code now parses whether systemd is executed directly or via sh -c. However, it seems to not cover cases where the entrypoint is different - for example, a direct call to an executable shell script.

If I run a container with an entrypoint such as the following

ENTRYPOINT ["/usr/local/sbin/entrypoint.sh"]
CMD ["/usr/lib/systemd/systemd"]

the detection does not work, and --systemd=always needs to be manually passed.
I tried to "hack" around it using

ENTRYPOINT ["sh", "-c", "/usr/local/sbin/entrypoint.sh"]

however that naturally fails to pass the CMD as $1 into the script, same with the shell form.

Is there possibly an elegant solution to my use case with the current "detection" code which does not involve manually passing --systemd?

The reason I am not executing systemd directly is wanting a shell script as an entrypoint which can make use of environment variables and do some basic system configuration before exec'ing into systemd passed as $1 via CMD.

@vrothberg
Copy link
Member

@tacerus, I don't see a way. How could podman know what entrypoint.sh does?

Note that systemd really needs to be the PID 1 inside the container.

@tacerus
Copy link

tacerus commented Nov 22, 2022

Thanks for your input. Given it's a shell script, which runs exec at the end, the PID 1 becomes systemd - the shell script is only doing temporary configuration and then disappears. It works fine with

ENTRYPOINT ["/usr/local/sbin/entrypoint.sh"]
CMD ["/usr/lib/systemd/systemd"]

just that --systemd=always needs to be passed by the user, otherwise the "ugly" error shown in the original post will appear and the container will fail to start.
I figured Podman could "just" check if systemd appears in either in ENTRYPOINT or in CMD, without caring which interpreter is handling the execution of systemd - but I am uncertain whether there are more considerations to be made.

@rhatdan
Copy link
Member

rhatdan commented Nov 22, 2022

I think at one time we were doing something like that and it was complained about, basically stating that they did not want to run in systemd mode.

I tend to agree that systemd in entrypoint or command should put us into systemd mode.

You could also call your entrypoint script systemd, which would trigger the behaviour you want, I believe.

ENTRYPOINT ["/usr/local/sbin/systemd"]

@tacerus
Copy link

tacerus commented Nov 22, 2022

I see - well, maybe it could be considered again in the future - but I get there might be other use cases.

Thank you so much! That "hack" with calling my entrypoint script systemd works great!

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 9, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 9, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants