Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run container in a system without syslog plus debug log level, the exit will time out #19075

Closed
fangpenlin opened this issue Jun 30, 2023 · 3 comments · Fixed by #19179
Closed
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@fangpenlin
Copy link
Contributor

Issue Description

I ran into a problem, when you run the podman command in an environment where syslog is not available with a debug log level, the container exit is going to timeout. For example, in an Alpine docker container with podman package installed, then you ran this

podman --log-level=debug run -it alpine echo hi

Then you will see after hi shows, it takes 20 seconds until the timeout waiting for container error

DEBU[0000] Starting container 995e71cd3e7a94b39457b209bb4abba49f3903656ef48d3b895a4e814ce24cdf with command [echo hi]
DEBU[0000] Received a resize event: {Width:198 Height:50}
DEBU[0000] Started container 995e71cd3e7a94b39457b209bb4abba49f3903656ef48d3b895a4e814ce24cdf
DEBU[0000] Notify sent successfully
hi
DEBU[0020] Exceeded conmon timeout waiting for container 995e71cd3e7a94b39457b209bb4abba49f3903656ef48d3b895a4e814ce24cdf to exit
DEBU[0020] Called run.PersistentPostRunE(podman --log-level=debug run -it alpine echo hi)
DEBU[0020] Shutting down engines

After tracing into the code I now understand why. The first would be, when debug log level is provided, --syslog will be added to the conmon exit command arguments from here

if syslog {
command = append(command, "--syslog")
}

Then syslog enabled or not is from here

exitCommand, err := specgenutil.CreateExitCommandArgs(ctr.runtime.storageConfig, ctr.runtime.config, logrus.IsLevelEnabled(logrus.DebugLevel), ctr.AutoRemove(), false)

As you can see the part

logrus.IsLevelEnabled(logrus.DebugLevel)

However, when the command runs, it sees the syslog flag is set, it will try to use it.

hook, err := logrusSyslog.NewSyslogHook("", "", syslog.LOG_INFO, "")
if err != nil {
fmt.Fprint(os.Stderr, "Failed to initialize syslog hook: "+err.Error())
os.Exit(1)
}

But then it will fail because syslog was not available on my Alpine environment. So it exits with code 1. And that's the cleanup command exits without even doing anything at all. As a result, the parent podman process will wait for the result updates set by cleanup but it will never happen.

Steps to reproduce the issue

Steps to reproduce the issue

docker run --privileged --entrypoint=/bin/sh -i alpine:3.18.2 \
   -c "apk add podman fuse fuse-overlayfs && \
    sed -e 's|^#mount_program|mount_program|g' /usr/share/containers/storage.conf > /etc/containers/storage.conf && \
    podman --log-level=debug run -i alpine:3.18.2 echo hi"

Describe the results you received

time="2023-06-30T18:53:23Z" level=info msg="Got Conmon PID as 88"
time="2023-06-30T18:53:23Z" level=debug msg="Created container 1286f6fb875b050a73ad7470dc9a8e05bb32f7c51508d1402ce59868fb849ae2 in OCI runtime"
time="2023-06-30T18:53:23Z" level=debug msg="Attaching to container 1286f6fb875b050a73ad7470dc9a8e05bb32f7c51508d1402ce59868fb849ae2"
time="2023-06-30T18:53:23Z" level=debug msg="Starting container 1286f6fb875b050a73ad7470dc9a8e05bb32f7c51508d1402ce59868fb849ae2 with command [echo hi]"
time="2023-06-30T18:53:23Z" level=debug msg="Started container 1286f6fb875b050a73ad7470dc9a8e05bb32f7c51508d1402ce59868fb849ae2"
time="2023-06-30T18:53:23Z" level=debug msg="Notify sent successfully"
hi
time="2023-06-30T18:53:43Z" level=debug msg="Exceeded conmon timeout waiting for container 1286f6fb875b050a73ad7470dc9a8e05bb32f7c51508d1402ce59868fb849ae2 to exit"
time="2023-06-30T18:53:43Z" level=debug msg="Called run.PersistentPostRunE(podman --log-level=debug run -i alpine:3.18.2 echo hi)"
time="2023-06-30T18:53:43Z" level=debug msg="Shutting down engines"

Describe the results you expected

The cleanup should finish without timeout error i.e, make clean up command not fail when syslog is not available. If syslog is necessary for running with debug log level (I saw somewhere it mentions some communications are done with syslog?), it should display an error instead of run the command and wait until timeout.

podman info output

host:
  arch: amd64
  buildahVersion: 1.31.0-dev
  cgroupControllers:
  - cpuset
  - cpu
  - cpuacct
  - blkio
  - memory
  - devices
  - freezer
  - net_cls
  - perf_event
  - net_prio
  - hugetlb
  - pids
  - rdma
  - misc
  cgroupManager: cgroupfs
  cgroupVersion: v1
  conmon:
    package: conmon-2.1.7-r1
    path: /usr/bin/conmon
    version: 'conmon version 2.1.7, commit: unknown'
  cpuUtilization:
    idlePercent: 99.29
    systemPercent: 0.29
    userPercent: 0.42
  cpus: 32
  databaseBackend: boltdb
  distribution:
    distribution: alpine
    version: 3.18.2
  eventLogger: file
  freeLocks: 2048
  hostname: d80c08b11442
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.15.90.1-microsoft-standard-WSL2
  linkmode: dynamic
  logDriver: k8s-file
  memFree: 910123008
  memTotal: 16720412672
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.6.0-r0
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.6.0
    package: netavark-1.6.0-r0
    path: /usr/libexec/podman/netavark
    version: netavark 1.6.0
  ociRuntime:
    name: crun
    package: crun-1.8.4-r0
    path: /usr/bin/crun
    version: |-
      crun version 1.8.4
      commit: 5a8fa99a5e41facba2eda4af12fa26313918805b
      rundir: /run/crun
      spec: 1.0.0
      +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
  os: linux
  pasta:
    executable: ""
    package: ""
    version: ""
  remoteSocket:
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /etc/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.0-r0
    version: |-
      slirp4netns version 1.2.0
      commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.4
  swapFree: 3524632576
  swapTotal: 4294967296
  uptime: 48h 40m 56.00s (Approximately 2.00 days)
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - docker.io
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: fuse-overlayfs-1.11-r1
      Version: |-
        fuse-overlayfs: version 1.11
        fusermount3 version: 3.14.1
        FUSE library version 3.14.1
        using FUSE kernel interface version 7.31
    overlay.mountopt: nodev
  graphRoot: /var/lib/containers/storage
  graphRootAllocated: 269427478528
  graphRootUsed: 69159690240
  graphStatus:
    Backing Filesystem: overlayfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 0
  runRoot: /run/containers/storage
  transientStore: false
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 4.7.0-dev
  Built: 1688112376
  BuiltTime: Fri Jun 30 08:06:16 2023
  GitCommit: fca3c2ef8476669825564af89b008946bdf87599-dirty
  GoVersion: go1.20.5
  Os: linux
  OsArch: linux/amd64
  Version: 4.7.0-dev


### Podman in a container

Yes

### Privileged Or Rootless

None

### Upstream Latest Release

Yes

### Additional environment details

Additional environment details

### Additional information

Additional information like issue happens only occasionally or issue happens with a particular architecture or on a particular setting
@fangpenlin fangpenlin added the kind/bug Categorizes issue or PR as related to a bug. label Jun 30, 2023
@fangpenlin
Copy link
Contributor Author

For people who are looking into workaround, such as running podman in alpine, an easy way is to simply install openrc or s6-overlay with a syslog running when the container starts, like this:

 FROM alpine
 RUN apk add s6-overlay s6-overlay-syslogd podman fuse fuse-overlayfs
 RUN sed -e 's|^#mount_program|mount_program|g' /usr/share/containers/storage.conf > /etc/containers/storage.conf
 ENTRYPOINT ["/init"]

This way, when you run with debug log level, syslog can be found and the exit won't timeout

docker run --privileged -i my-image podman --log-level=debug run -i alpine echo hi

@Luap99
Copy link
Member

Luap99 commented Jul 5, 2023

What is the error message when you run something like: podman --syslog info.
We should likely just ignore the error for the cleanup command.

@fangpenlin
Copy link
Contributor Author

docker run --privileged --entrypoint=/bin/sh -i alpine:3.18.2 \
   -c "apk add podman fuse fuse-overlayfs && \
    sed -e 's|^#mount_program|mount_program|g' /usr/share/containers/storage.conf > /etc/containers/storage.conf && \
    podman --syslog info"

and the result:

Failed to initialize syslog hook: Unix syslog delivery error

@Luap99 Luap99 self-assigned this Jul 10, 2023
Luap99 added a commit to Luap99/libpod that referenced this issue Jul 10, 2023
Podman will always pass down --syslog to conmon since 13c2aca.
However there systems without syslog running, likely in container
setups. As reported in this was already a problem before when debug
level is used. Then conmon will pass down --syslog back to the podman
container cleanup command causing it to fail without doing anything.
Given that I think it is better to just ignore the error and log it on
debug level, we need to make sure cleanup works consistently.

[NO NEW TESTS NEEDED]

Fixes containers#19075

Signed-off-by: Paul Holzinger <[email protected]>
ashley-cui pushed a commit to ashley-cui/podman that referenced this issue Jul 13, 2023
Podman will always pass down --syslog to conmon since 13c2aca.
However there systems without syslog running, likely in container
setups. As reported in this was already a problem before when debug
level is used. Then conmon will pass down --syslog back to the podman
container cleanup command causing it to fail without doing anything.
Given that I think it is better to just ignore the error and log it on
debug level, we need to make sure cleanup works consistently.

[NO NEW TESTS NEEDED]

Fixes containers#19075

Signed-off-by: Paul Holzinger <[email protected]>
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Oct 9, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 9, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants