Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podman start does not start conmon #16634

Closed
popescore opened this issue Nov 26, 2022 · 18 comments
Closed

podman start does not start conmon #16634

popescore opened this issue Nov 26, 2022 · 18 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/feature Categorizes issue or PR as related to a new feature. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@popescore
Copy link

popescore commented Nov 26, 2022

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

Creating and starting a podman container from an OCI container, using '--root ' option.
Exit and restart the OCI container.
'podman ps' and 'inspect' commands show the podman container running. conmon is not started even after 'podman start'. Podman container PID is dated.

Steps to reproduce the issue:

All podman commands are issued from an OCI container.

  1. Creating Podman container
    Expected.

podman create:

podman --root /persistent/b8ae25a8-22b8-456b-a874-f4a29039fbcb/root/storage --storage-opt overlay.imagestore=/host/var/lib/containers/storage,/persistent/podman --log-level=debug create --privileged --tty --name b8ae25a8-22b8-456b-a874-f4a29039fbcb --pull never --restart no --hostname pod_node1 --net ns:/var/run/netns/pod_b8aenet --mount type=tmpfs,destination=/tmp --volume /static:/.xyz/static:bind,ro --volume /xyz/bin:/.xyz/bin:bind,ro node:1.5
  1. Starting Podman container
    Expected.

podman start, using --log-level=debug option:

podman --root /persistent/b8ae25a8-22b8-456b-a874-f4a29039fbcb/root/storage --storage-opt overlay.imagestore=/host/var/lib/containers/storage,/persistent/podman --log-level=debug start b8ae25a8-22b8-456b-a874-f4a29039fbcb

time="2022-11-26T18:07:18Z" level=info msg="podman filtering at log level debug"
time="2022-11-26T18:07:18Z" level=debug msg="Called start.PersistentPreRunE(podman --root /persistent/b8ae25a8-22b8-456b-a874-f4a29039fbcb/root/storage --storage-opt overlay.imagestore=/host/var/lib/containers/storage,/persistent/podman --log-level=debug start b8ae25a8-22b8-456b-a874-f4a29039fbcb)"
time="2022-11-26T18:07:18Z" level=debug msg="Merged system config \"/usr/share/containers/containers.conf\""
time="2022-11-26T18:07:18Z" level=debug msg="Using conmon: \"/usr/bin/conmon\""
time="2022-11-26T18:07:18Z" level=debug msg="Initializing boltdb state at /persistent/b8ae25a8-22b8-456b-a874-f4a29039fbcb/root/storage/libpod/bolt_state.db"
time="2022-11-26T18:07:18Z" level=debug msg="Using graph driver overlay"
time="2022-11-26T18:07:18Z" level=debug msg="Using graph root /persistent/b8ae25a8-22b8-456b-a874-f4a29039fbcb/root/storage"
time="2022-11-26T18:07:18Z" level=debug msg="Using run root /persistent/staging/podman"
time="2022-11-26T18:07:18Z" level=debug msg="Using static dir /persistent/b8ae25a8-22b8-456b-a874-f4a29039fbcb/root/storage/libpod"
time="2022-11-26T18:07:18Z" level=debug msg="Using tmp dir /run/libpod"
time="2022-11-26T18:07:18Z" level=debug msg="Using volume path /persistent/b8ae25a8-22b8-456b-a874-f4a29039fbcb/root/storage/volumes"
time="2022-11-26T18:07:18Z" level=debug msg="Set libpod namespace to \"\""
time="2022-11-26T18:07:18Z" level=debug msg="[graphdriver] trying provided driver \"overlay\""
time="2022-11-26T18:07:18Z" level=debug msg="overlay: imagestore=/host/var/lib/containers/storage,/persistent/podman"
time="2022-11-26T18:07:18Z" level=debug msg="cached value indicated that overlay is supported"
time="2022-11-26T18:07:18Z" level=debug msg="cached value indicated that metacopy is not being used"
time="2022-11-26T18:07:18Z" level=debug msg="cached value indicated that native-diff is usable"
time="2022-11-26T18:07:18Z" level=debug msg="backingFs=xfs, projectQuotaSupported=false, useNativeDiff=true, usingMetacopy=false"
time="2022-11-26T18:07:18Z" level=debug msg="Initializing event backend file"
time="2022-11-26T18:07:18Z" level=debug msg="configured OCI runtime kata initialization failed: no valid executable found for OCI runtime kata: invalid argument"
time="2022-11-26T18:07:18Z" level=debug msg="configured OCI runtime runsc initialization failed: no valid executable found for OCI runtime runsc: invalid argument"
time="2022-11-26T18:07:18Z" level=debug msg="Using OCI runtime \"/usr/bin/runc\""
time="2022-11-26T18:07:18Z" level=info msg="Found CNI network podman (type=bridge) at /etc/cni/net.d/87-podman.conflist"
time="2022-11-26T18:07:18Z" level=debug msg="Default CNI network name podman is unchangeable"
time="2022-11-26T18:07:18Z" level=info msg="Setting parallel job count to 25"
time="2022-11-26T18:07:18Z" level=debug msg="[graphdriver] trying provided driver \"overlay\""
time="2022-11-26T18:07:18Z" level=debug msg="overlay: imagestore=/host/var/lib/containers/storage,/persistent/podman"
time="2022-11-26T18:07:18Z" level=debug msg="cached value indicated that overlay is supported"
time="2022-11-26T18:07:18Z" level=debug msg="cached value indicated that metacopy is not being used"
time="2022-11-26T18:07:18Z" level=debug msg="backingFs=xfs, projectQuotaSupported=false, useNativeDiff=true, usingMetacopy=false"
time="2022-11-26T18:07:18Z" level=debug msg="overlay: mount_data=,lowerdir=/host/var/lib/containers/storage/overlay/l/56BGEVS66BRE43E2LIJ3LKE7TX:/host/var/lib/containers/storage/overlay/l/DUL4SMKVQVL6FXEEMQFZSN5WUC,upperdir=/persistent/b8ae25a8-22b8-456b-a874-f4a29039fbcb/root/storage/overlay/1dfde88b85d31be711d8864d8bd27c76a0b9d941d8280a4ca37c66caac142ac1/diff,workdir=/persistent/b8ae25a8-22b8-456b-a874-f4a29039fbcb/root/storage/overlay/1dfde88b85d31be711d8864d8bd27c76a0b9d9
41d8280a4ca37c66caac142ac1/work"
time="2022-11-26T18:07:18Z" level=debug msg="mounted container \"2518cf0428043692ccd8aa89edd8711bac4233b78b58db4c26ff61dd12a9d3a4\" at \"/persistent/b8ae25a8-22b8-456b-a874-f4a29039fbcb/root/storage/overlay/1dfde88b85d31be711d8864d8bd27c76a0b9d941d8280a4ca37c66caac142ac1/merged\""
time="2022-11-26T18:07:18Z" level=debug msg="Created root filesystem for container 2518cf0428043692ccd8aa89edd8711bac4233b78b58db4c26ff61dd12a9d3a4 at /persistent/b8ae25a8-22b8-456b-a874-f4a29039fbcb/root/storage/overlay/1dfde88b85d31be711d8864d8bd27c76a0b9d941d8280a4ca37c66caac142ac1/merged"
time="2022-11-26T18:07:18Z" level=debug msg="Workdir \"/\" resolved to host path \"/persistent/b8ae25a8-22b8-456b-a874-f4a29039fbcb/root/storage/overlay/1dfde88b85d31be711d8864d8bd27c76a0b9d941d8280a4ca37c66caac142ac1/merged\""
time="2022-11-26T18:07:18Z" level=debug msg="network configuration does not support host.containers.internal address"
time="2022-11-26T18:07:18Z" level=debug msg="/etc/system-fips does not exist on host, not mounting FIPS mode subscription"
time="2022-11-26T18:07:18Z" level=debug msg="Setting CGroup path for container 2518cf0428043692ccd8aa89edd8711bac4233b78b58db4c26ff61dd12a9d3a4 to /libpod_parent/libpod-2518cf0428043692ccd8aa89edd8711bac4233b78b58db4c26ff61dd12a9d3a4"
time="2022-11-26T18:07:18Z" level=debug msg="reading hooks from /usr/share/containers/oci/hooks.d"
time="2022-11-26T18:07:18Z" level=debug msg="Created OCI spec for container 2518cf0428043692ccd8aa89edd8711bac4233b78b58db4c26ff61dd12a9d3a4 at /persistent/b8ae25a8-22b8-456b-a874-f4a29039fbcb/root/storage/overlay-containers/2518cf0428043692ccd8aa89edd8711bac4233b78b58db4c26ff61dd12a9d3a4/userdata/config.json"
time="2022-11-26T18:07:18Z" level=debug msg="/usr/bin/conmon messages will be logged to syslog"
time="2022-11-26T18:07:18Z" level=debug msg="running conmon: /usr/bin/conmon" args="[--api-version 1 -c 2518cf0428043692ccd8aa89edd8711bac4233b78b58db4c26ff61dd12a9d3a4 -u 2518cf0428043692ccd8aa89edd8711bac4233b78b58db4c26ff61dd12a9d3a4 -r
/usr/bin/runc -b /persistent/b8ae25a8-22b8-456b-a874-f4a29039fbcb/root/storage/overlay-containers/2518cf0428043692ccd8aa89edd8711bac4233b78b58db4c26ff61dd12a9d3a4/userdata -p /persistent/staging/podman/overlay-containers/2518cf0428043692ccd8aa89edd8711bac4233b78b58db4c26ff61dd12a9d3a4/userdata/pidfile -n b8ae25a8-22b8-456b-a874-f4a29039fbcb --exit-dir /run/libpod/exits --full-attach -l k8s-file:/persistent/b8ae25a8-22b8-456b-a874-f4a29039fbcb/root/storage/overlay-containers/2518cf0428043692ccd8aa89edd8711bac4233b78b58db4c26ff61dd12a9d3a4/userdata/ctr.log --log-level debug --syslog -t --conmon-pidfile /persistent/staging/podman/overlay-containers/2518cf0428043692ccd8aa89edd8711bac4233b78b58db4c26ff61dd12a9d3a4/userdata/conmon.pid --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /persistent/b8ae25a8-22b8-456b-a874-f4a29039fbcb/root/storage --exit-command-arg --runroot --exit-command-arg /persistent/staging/podman --exit-command-arg --log-level --exit-command-arg debug --exit-command-arg --cgroup-manager --exit-command-arg cgroupfs --exit-command-arg --tmpdir --exit-command-arg /run/libpod --exit-command-arg --runtime --exit-command-arg runc --exit-command-arg --storage-driver --exit-command-arg overlay --exit-command-arg --storage-opt --exit-command-arg overlay.imagestore=/host/var/lib/containers/storage,/persistent/podman --exit-command-arg --events-backend --exit-command-arg file --exit-command-arg --syslog --exit-command-arg container --exit-command-arg cleanup --exit-command-arg 2518cf0428043692ccd8aa89edd8711bac4233b78b58db4c26ff61dd12a9d3a4]"
time="2022-11-26T18:07:18Z" level=debug msg="Received: 243"
time="2022-11-26T18:07:18Z" level=info msg="Got Conmon PID as 231"
time="2022-11-26T18:07:18Z" level=debug msg="Created container 2518cf0428043692ccd8aa89edd8711bac4233b78b58db4c26ff61dd1
2a9d3a4 in OCI runtime"
time="2022-11-26T18:07:18Z" level=debug msg="Starting container 2518cf0428043692ccd8aa89edd8711bac4233b78b58db4c26ff61dd
12a9d3a4 with command [/usr/local/bin/entrypoint /sbin/init]"
time="2022-11-26T18:07:18Z" level=debug msg="Started container 2518cf0428043692ccd8aa89edd8711bac4233b78b58db4c26ff61dd12a9d3a4"
time="2022-11-26T18:07:18Z" level=debug msg="Called start.PersistentPostRunE(podman --root /persistent/b8ae25a8-22b8-456b-a874-f4a29039fbcb/root/storage --storage-opt overlay.imagestore=/host/var/lib/containers/storage,/persistent/podman --log-level=debug start b8ae25a8-22b8-456b-a874-f4a29039fbcb)"
  1. Podman inspect, get PID
    Expected.
podman --root /persistent/b8ae25a8-22b8-456b-a874-f4a29039fbcb/root/storage --storage-opt overlay.imagestore=/host/var/lib/containers/storage,/persistent/podman inspect --format {{.State.Pid}} b8ae25a8-22b8-456b-a874-f4a29039fbcb
243
  1. Restarting OCI container, Podman container

Restarting the OCI container.
Note that the /persistent directory (used on storage.conf/graphroot and command --root option) is persistent
between the two OCI container runing sessions, as it is bind-mounted on the host path.
/staging directory is local to the OCI container.

  1. Podman ps, get status
    container to be in running state is not expected(?)
podman --root /persistent/b8ae25a8-22b8-456b-a874-f4a29039fbcb/root/storage --storage-opt overlay.imagestore=/host/var/lib/containers/storage,/persistent/podman ps --all --format "{{.Names}} {{.Status}}"

b8ae25a8-22b8-456b-a874-f4a29039fbcb Up 28 minutes ago
  1. Podman inspect, get PID
    Getting the old PID is not expected!
podman --root /persistent/b8ae25a8-22b8-456b-a874-f4a29039fbcb/root/storage --storage-opt overlay.imagestore=/host/var/lib/containers/storage,/persistent/podman inspect --format {{.State.Pid}} b8ae25a8-22b8-456b-a874-f4a29039fbcb
243
  1. Podman start
    comon not starting is not expected!

Using the '--log-level=debug' option again:

podman --root /persistent/b8ae25a8-22b8-456b-a874-f4a29039fbcb/root/storage --storage-opt overlay.imagestore=/host/var/lib/containers/storage,/persistent/podman --log-level=debug start b8ae25a8-22b8-456b-a874-f4a29039fbcb

time="2022-11-26T18:25:06Z" level=info msg="podman filtering at log level debug"
time="2022-11-26T18:25:06Z" level=debug msg="Called start.PersistentPreRunE(podman --root /persistent/b8ae25a8-22b8-456b-a874-f4a29039fbcb/root/storage --storage-opt overlay.imagestore=/host/var/lib/containers/storage,/persistent/podman --log-level=debug start b8ae25a8-22b8-456b-a874-f4a29039fbcb)"
time="2022-11-26T18:25:06Z" level=debug msg="Merged system config \"/usr/share/containers/containers.conf\""
time="2022-11-26T18:25:06Z" level=debug msg="Using conmon: \"/usr/bin/conmon\""
time="2022-11-26T18:25:06Z" level=debug msg="Initializing boltdb state at /persistent/b8ae25a8-22b8-456b-a874-f4a29039fbcb/root/storage/libpod/bolt_state.db"
time="2022-11-26T18:25:06Z" level=debug msg="Using graph driver overlay"
time="2022-11-26T18:25:06Z" level=debug msg="Using graph root /persistent/b8ae25a8-22b8-456b-a874-f4a29039fbcb/root/storage"
time="2022-11-26T18:25:06Z" level=debug msg="Using run root /persistent/staging/podman"
time="2022-11-26T18:25:06Z" level=debug msg="Using static dir /persistent/b8ae25a8-22b8-456b-a874-f4a29039fbcb/root/storage/libpod"
time="2022-11-26T18:25:06Z" level=debug msg="Using tmp dir /run/libpod"
time="2022-11-26T18:25:06Z" level=debug msg="Using volume path /persistent/b8ae25a8-22b8-456b-a874-f4a29039fbcb/root/storage/volumes"
time="2022-11-26T18:25:06Z" level=debug msg="Set libpod namespace to \"\""
time="2022-11-26T18:25:06Z" level=debug msg="[graphdriver] trying provided driver \"overlay\""
time="2022-11-26T18:25:06Z" level=debug msg="overlay: imagestore=/host/var/lib/containers/storage,/persistent/podman"
time="2022-11-26T18:25:06Z" level=debug msg="cached value indicated that overlay is supported"
time="2022-11-26T18:25:06Z" level=debug msg="cached value indicated that metacopy is not being used"
time="2022-11-26T18:25:06Z" level=debug msg="cached value indicated that native-diff is usable"
time="2022-11-26T18:25:06Z" level=debug msg="backingFs=xfs, projectQuotaSupported=false, useNativeDiff=true, usingMetacopy=false"
time="2022-11-26T18:25:06Z" level=debug msg="Initializing event backend file"
time="2022-11-26T18:25:06Z" level=debug msg="configured OCI runtime kata initialization failed: no valid executable found for OCI runtime kata: invalid argument"
time="2022-11-26T18:25:06Z" level=debug msg="configured OCI runtime runsc initialization failed: no valid executable found for OCI runtime runsc: invalid argument"
time="2022-11-26T18:25:06Z" level=debug msg="Using OCI runtime \"/usr/bin/runc\""
time="2022-11-26T18:25:06Z" level=info msg="Found CNI network podman (type=bridge) at /etc/cni/net.d/87-podman.conflist"
time="2022-11-26T18:25:06Z" level=debug msg="Default CNI network name podman is unchangeable"
time="2022-11-26T18:25:06Z" level=info msg="Setting parallel job count to 25"
time="2022-11-26T18:25:06Z" level=debug msg="Called start.PersistentPostRunE(podman --root /persistent/b8ae25a8-22b8-456b-a874-f4a29039fbcb/root/storage --storage-opt overlay.imagestore=/host/var/lib/containers/storage,/persistent/podman --
log-level=debug start b8ae25a8-22b8-456b-a874-f4a29039fbcb)"

'podman start' command succeeds, but looking at the logs, the conmon is not started.

  1. Verify conmon running state
    conmon not running is not expected.
ps aux | grep conmon
root         244  0.0  0.0   9208  1148 pts/1    S+   19:35   0:00 grep conmon

Expected conmon to be started by 'podman start' (as on initial 'podman start' command) after OCI is restarted.
No errors on the command debug output either!

Same results (after OCI container restart), on podman ps and inspect commands, inspect returns the wrong container PID:

  1. Podman ps, get container status
    Not expected.
podman --root /persistent/b8ae25a8-22b8-456b-a874-f4a29039fbcb/root/storage --storage-opt overlay.imagestore=/host/var/lib/containers/storage,/persistent/podman ps --all --format "{{.Names}} {{.Status}}"
b8ae25a8-22b8-456b-a874-f4a29039fbcb Up 29 minutes ago
  1. Podman inspect, get PID
    Retrieving the older container PID is not expected.
podman --root /persistent/b8ae25a8-22b8-456b-a874-f4a29039fbcb/root/storage --storage-opt overlay.imagestore=/host/var/lib/containers/storage,/persistent/podman inspect --format {{.State.Pid}} b8ae25a8-22b8-456b-a874-f4a29039fbcb
243

The /proc/ does not exist:

> ls -l /proc/243
ls: cannot access '/proc/243': No such file or directory

Expected conmon to run and with a different container PID after OCI container restart on step 4.
Or at least, if it happens to get the same PID, the '/proc/' should be valid.

podman --root /persistent/d59598c1-ed1f-4198-b8b4-442b836cb995/root/storage --storage-opt overlay.imagestore=/host/var/lib/containers/storage,/persistent/podman inspect --format {{.PidFile}} d59598c1-ed1f-4198-b8b4-442b836cb995
/staging/podman/overlay-containers/45ce60fd4425305ba2ffaa3c9e8ec7687dc2cd8a4301d891af236cad096ff4ef/userdata/pidfile

Verify that the initial/local file containing the container-PID does not exists:

cat /staging/podman/overlay-containers/45ce60fd4425305ba2ffaa3c9e8ec7687dc2cd8a4301d891af236cad096ff4ef/userdata/pidfile

cat: /staging/podman/overlay-containers/45ce60fd4425305ba2ffaa3c9e8ec7687dc2cd8a4301d891af236cad096ff4ef/userdata/pidfil
e: No such file or directory

Describe the results you received:

After restarting the OCI container, 'podman ps' shows the podman container is running and 'podman inspect --format {{.PidFile}}' shows its older PID, from the initial OCI container session before been restarted.

Describe the results you expected:

After restarting the OCI container, 'podman ps' should not show the container in 'running' state.
Not sure what 'podman inspect --format {{.PidFile}}' PID should be returned in the not-running state, but after 'podman start' command, the container PID should match a valid /proc/container-PID entry.

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

(From the OCI container that is used to start podman container(s))

podman version 3.3.1

Output of conmon version:

conmon --version
conmon version 2.0.29
commit: 84384406047fae626269133e1951c4b92eed7603

Output of podman info:

podman --root /persistent/00050055-6d13-4ee0-933a-014152980281/root/storage --storage-opt overlay.imagestore=/host/var/lib/containers/storage,/persistent/podman info
host:
  arch: amd64
  buildahVersion: 1.22.3
  cgroupControllers:
  - cpuset
  - cpu
  - cpuacct
  - blkio
  - memory
  - devices
  - freezer
  - net_cls
  - perf_event
  - net_prio
  - hugetlb
  - pids
  - rdma
  cgroupManager: cgroupfs
  cgroupVersion: v1
  conmon:
    package: Unknown
    path: /usr/bin/conmon
    version: 'conmon version 2.0.29, commit: 84384406047fae626269133e1951c4b92eed7603'
  cpus: 8
  distribution:
    distribution: '"centos"'
    version: "8"
  eventLogger: file
  hostname: db209a77-dc5f-4eb6-b6d0-56a173fa7848
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 4.18.0-348.7.1.el8_5.x86_64
  linkmode: dynamic
  memFree: 25942884352
  memTotal: 33511063552
  ociRuntime:
    name: runc
    package: Unknown
    path: /usr/bin/runc
    version: |-
      runc version 1.0.2
      spec: 1.0.2-dev
      go: go1.16.7
      libseccomp: 2.5.1
  os: linux
  remoteSocket:
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_NET_RAW,CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /bin/slirp4netns
    package: Unknown
    version: |-
      slirp4netns version 1.1.8
      commit: d361001f495417b880f20329121e3aa431a8f90f
      libslirp: 4.4.0
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.1
  swapFree: 1044480
  swapTotal: 1044480
  uptime: 814h 49m 43.54s (Approximately 33.92 days)
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - docker.artifactory.eng.nimblestorage.com
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 1
    paused: 0
    running: 1
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.imagestore: /host/var/lib/containers/storage,/persistent/podman
  graphRoot: /persistent/00050055-6d13-4ee0-933a-014152980281/root/storage
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 4
  runRoot: /staging/podman
  volumePath: /persistent/00050055-6d13-4ee0-933a-014152980281/root/storage/volumes
version:
  APIVersion: 3.3.1
  Built: 1636493036
  BuiltTime: Tue Nov  9 21:23:56 2021
  GitCommit: ""
  GoVersion: go1.16.7
  OsArch: linux/amd64
  Version: 3.3.1

storage.conf on OCI container:

# Temporary storage location
runroot = "/staging/podman/"

# Primary Read/Write location of container storage
graphroot = "/persistent/podman/"

additionalimagestores = [
  "/host/var/lib/containers/storage"
]

Package info (e.g. output of rpm -q podman or apt list podman or brew info podman):

podman-3.3.1-9.module_el8.5.0+988+b1f0b741.x86_64

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/main/troubleshooting.md)

No
I can't upgrade the podman version, compatibility issues.
I checked the Troubleshooting Guide.

Additional environment details (AWS, VirtualBox, physical, etc.):

OCI container, which starts the podman container:

cat /etc/os-release
NAME="CentOS Linux"
VERSION="8"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="8"
PLATFORM_ID="platform:el8"
PRETTY_NAME="CentOS Linux 8"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:8"
HOME_URL="https://centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-8"
CENTOS_MANTISBT_PROJECT_VERSION="8"

uname -a
Linux a934b2b8-e55d-4b00-91a2-842bea977e8b 4.18.0-348.7.1.el8_5.x86_64 #1 SMP Wed Dec 22 13:25:12 UTC 2021 
x86_64 x86_64 x86_64 GNU/Linux
@openshift-ci openshift-ci bot added kind/bug Categorizes issue or PR as related to a bug. kind/feature Categorizes issue or PR as related to a new feature. labels Nov 26, 2022
@giuseppe
Copy link
Member

there is not any information attached to the issue. Please reopen once you've filled all the requested info and provide a reproducer.

@giuseppe giuseppe closed this as not planned Won't fix, can't repro, duplicate, stale Nov 28, 2022
@popescore
Copy link
Author

I don't know how to reopen the issue.

@popescore
Copy link
Author

popescore commented Nov 30, 2022

there is not any information attached to the issue. Please reopen once you've filled all the requested info and provide a reproducer.

I couldn't reopen the issue, please reopen it as I don't seem to have the access. Otherwise I'll open a new one.
What do you mean by "provide a reproducer"?
I provided all the podman commands, I didn't provide instructions for OCI container image but I mentioned in the description the linux distro and version.

@giuseppe giuseppe reopened this Nov 30, 2022
@giuseppe
Copy link
Member

it could be caused by a crash in conmon. Would you be able to verify that?

I think something like containers/conmon#361 could be the reason for it

@popescore
Copy link
Author

popescore commented Dec 1, 2022

it could be caused by a crash in conmon. Would you be able to verify that?

I think something like containers/conmon#361 could be the reason for it

If 'podman start' would start conmon, regardless if conmon crashes after detaching or not, wouldn't the conmon command with all its options, show up in the 'podman --log-level=debug start' logs, just like in the first 'podman start' command instance in step 2?
This is the conmon command showing in the 'podman start' logs:

time="2022-11-26T18:07:18Z" level=debug msg="running conmon: /usr/bin/conmon" args="[--api-version 1 -c 2518cf0428043692ccd8aa89edd8711bac4233b78b58db4c26ff61dd12a9d3a4 -u 2518cf0428043692ccd8aa89edd8711bac4233b78b58db4c26ff61dd12a9d3a4 -r
/usr/bin/runc -b /persistent/b8ae25a8-22b8-456b-a874-f4a29039fbcb/root/storage/overlay-containers/2518cf0428043692ccd8aa89edd8711bac4233b78b58db4c26ff61dd12a9d3a4/userdata -p /persistent/staging/podman/overlay-containers/2518cf0428043692ccd8aa89edd8711bac4233b78b58db4c26ff61dd12a9d3a4/userdata/pidfile -n b8ae25a8-22b8-456b-a874-f4a29039fbcb --exit-dir /run/libpod/exits --full-attach -l k8s-file:/persistent/b8ae25a8-22b8-456b-a874-f4a29039fbcb/root/storage/overlay-containers/2518cf0428043692ccd8aa89edd8711bac4233b78b58db4c26ff61dd12a9d3a4/userdata/ctr.log --log-level debug --syslog -t --conmon-pidfile /persistent/staging/podman/overlay-containers/2518cf0428043692ccd8aa89edd8711bac4233b78b58db4c26ff61dd12a9d3a4/userdata/conmon.pid --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /persistent/b8ae25a8-22b8-456b-a874-f4a29039fbcb/root/storage --exit-command-arg --runroot --exit-command-arg /persistent/staging/podman --exit-command-arg --log-level --exit-command-arg debug --exit-command-arg --cgroup-manager --exit-command-arg cgroupfs --exit-command-arg --tmpdir --exit-command-arg /run/libpod --exit-command-arg --runtime --exit-command-arg runc --exit-command-arg --storage-driver --exit-command-arg overlay --exit-command-arg --storage-opt --exit-command-arg overlay.imagestore=/host/var/lib/containers/storage,/persistent/podman --exit-command-arg --events-backend --exit-command-arg file --exit-command-arg --syslog --exit-command-arg container --exit-command-arg cleanup --exit-command-arg 2518cf0428043692ccd8aa89edd8711bac4233b78b58db4c26ff61dd12a9d3a4]"

One other question, quote from conmon README - 'Writes the contents of the container's streams to a log file ',
how to explicitly setup - from Podman - the conmon log file, as there is no syslog/messages setup on the OCI container?

@popescore
Copy link
Author

popescore commented Dec 1, 2022

it could be caused by a crash in conmon. Would you be able to verify that?

I think something like containers/conmon#361 could be the reason for it

I verified, the same behavior with the latest version, conmon is not started/running:

conmon --version
conmon version 2.1.5
commit: 6686b9342d73c56a55574700be9814ea46a3ed35

@giuseppe
Copy link
Member

giuseppe commented Dec 1, 2022

could you run /usr/share/bcc/tools/execsnoop --max-args 100 before launching podman start to verify what processes are executed?

What could happen is that conmon crashed but the container still runs so podman start won't restart it again. Does it make any difference if you kill the container process first then try again podman start?

Also, have you created a new container using the freshly installed conmon so that the first run is using a fixed conmon?

@popescore
Copy link
Author

popescore commented Dec 2, 2022

could you run /usr/share/bcc/tools/execsnoop --max-args 100 before launching podman start to verify what processes are executed?

--max-arg is not a valid option:

> execsnoop --max-args 100
/bin/execsnoop: illegal option -- -

Also, the OCI 'master' container doesn't have access to /sys/kernel/debug/, it is running on a VM:

> execsnoop -rt -d 60
Tracing exec()s for 60 seconds (buffered)...
/bin/execsnoop: line 160: cd: /sys/kernel/debug/tracing: No such file or directory
ERROR: accessing tracing. Root user? Kernel has FTRACE?
    debugfs mounted? (mount -t debugfs debugfs /sys/kernel/debug)

What could happen is that conmon crashed

crashed when? when I run 'podman start'?
Because is definitely not running before 'podman start'.

but the container still runs

I assume you are referring to podman container that still runs. Is it possible that it runs even if the OCI 'master' container that started it exits?
It would make sense considering that 'podman ps' and 'podman inspect' still report the podman container as running. However, the podman container's PID is not valid any longer as the OCI container restarted.

so podman start won't restart it again. Does it make any difference if you kill the container process first then try again podman start?

I killed the podman container after the OCI container restarted but before 'podman start':

podman --root /persistent/be075178-2975-4638-803e-b2    4b32995164/root/storage --storage-opt overlay.imagestore=/host/var/lib/containers/storage,/persistent/podman kill be075178-2975-4638-803e-b24b32995164

level=error msg="container \"2a07e707a6a33b21e442dda02a0f379b03cf2772a5cfd40fa1d07d2cca4fb2    71\" does not exist"
Error: error sending signal to container 2a07e707a6a33b21e442dda02a0f379b03cf2772a5cfd40fa1d07d2cca4fb271: `/usr/bin/runc kill 2a07e707a6a33b21e442dda02a0f379b03cf2772a5cfd40fa1d07d2cca4fb271 9` failed: exit status 1

I also stopped the podman container (on a different test) after the OCI container restarted but before 'podman start':

podman --root /persistent/6b2489a1-76a2-4cf8-bbbe-a85c18ca5922/root/storage --storage-opt overlay.imagestore=/host/var/lib/containers/storage,/persistent/podman 'stop', '6b2489a1-76a2-4cf8-bbbe-a85c18ca5922

Error: container 8a08d064cb298b6f1cf55d4479c98ff189d7c361ab29a17d7942f5130d96349a conmon process missing, cannot retrieve exit code: conmon process killed

Also, have you created a new container using the freshly installed conmon so that the first run is using a fixed conmon?

Yes, the new conmon replaced the old one on the host (a VM), then ran all the testing steps.

@giuseppe
Copy link
Member

giuseppe commented Dec 2, 2022

from the logs it seems that conmon is started:

time="2022-11-26T18:07:18Z" level=debug msg="running conmon: /usr/bin/conmon" args="[--api-version 1 -c 2518cf0428043692ccd8aa89edd8711bac4233b78b58db4c26ff61dd12a9d3a4 -u 2518cf0428043692ccd8aa89edd8711bac4233b78b58db4c26ff61dd12a9d3a4 -r
/usr/bin/runc -b /persistent/b8ae25a8-22b8-456b-a874-f4a29039fbcb/root/storage/overlay-containers/2518cf0428043692ccd8aa89edd8711bac4233b78b58db4c26ff61dd12a9d3a4/userdata -p /persistent/staging/podman/overlay-containers/2518cf0428043692ccd8aa89edd8711bac4233b78b58db4c26ff61dd12a9d3a4/userdata/pidfile -n b8ae25a8-22b8-456b-a874-f4a29039fbcb --exit-dir /run/libpod/exits --full-attach -l k8s-file:/persistent/b8ae25a8-22b8-456b-a874-f4a29039fbcb/root/storage/overlay-containers/2518cf0428043692ccd8aa89edd8711bac4233b78b58db4c26ff61dd12a9d3a4/userdata/ctr.log --log-level debug --syslog -t --conmon-pidfile /persistent/staging/podman/overlay-containers/2518cf0428043692ccd8aa89edd8711bac4233b78b58db4c26ff61dd12a9d3a4/userdata/conmon.pid --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /persistent/b8ae25a8-22b8-456b-a874-f4a29039fbcb/root/storage --exit-command-arg --runroot --exit-command-arg /persistent/staging/podman --exit-command-arg --log-level --exit-command-arg debug --exit-command-arg --cgroup-manager --exit-command-arg cgroupfs --exit-command-arg --tmpdir --exit-command-arg /run/libpod --exit-command-arg --runtime --exit-command-arg runc --exit-command-arg --storage-driver --exit-command-arg overlay --exit-command-arg --storage-opt --exit-command-arg overlay.imagestore=/host/var/lib/containers/storage,/persistent/podman --exit-command-arg --events-backend --exit-command-arg file --exit-command-arg --syslog --exit-command-arg container --exit-command-arg cleanup --exit-command-arg 2518cf0428043692ccd8aa89edd8711bac4233b78b58db4c26ff61dd12a9d3a4]"
time="2022-11-26T18:07:18Z" level=debug msg="Received: 243"

we need to figure out why it exits.

Is there any useful information in the journal?

@popescore
Copy link
Author

popescore commented Dec 2, 2022

we need to figure out why it exits.

Just to clarify, these conmon logs are from the step 2, the first/initial 'podman start' instance.

time="2022-11-26T18:07:18Z" level=debug msg="Received: 243"
time="2022-11-26T18:07:18Z" level=info msg="Got Conmon PID as 231"

Based on the debug logs, you assume conmon is exiting after 'podman start'.
However:

  • how do we explain for valid (nsenter works just fine with the PID) conmon PID bellow the "Received: 243" log?
  • how do we explain that 'ps' shows conmon as running, in the output bellow?
ps aux|grep conmon
root         230  0.0  0.0 356848  2512 ?        Ssl  17:02   0:00 /usr/bin/conmon --api-version 1 -c 39aaf06b80b590134581fdb23972a7fd12e5b71d0bf26ea6d06cbf9dd53969a8 -u 39aaf06b80b590134581fdb23972a7fd12e5b71d0bf26ea6d06cbf9dd53969a8 -r /usr/bin/runc -b /persistent/edbfea33-1b56-4119-b588-91401d24aced/root/storage/overlay-containers/39aaf06b80b590134581fdb23972a7fd12e5b71d0bf26ea6d06cbf9dd53969a8/userdata -p /staging/podman/overlay-containers/39aaf06b80b590134581fdb23972a7fd12e5b71d0bf26ea6d06cbf9dd53969a8/userdata/pidfile -n edbfea33-1b56-4119-b588-91401d24aced --exit-dir /run/libpod/exits --full-attach -l k8s-file:/persistent/edbfea33-1b56-4119-b588-91401d24aced/root/storage/overlay-containers/39aaf06b80b590134581fdb23972a7fd12e5b71d0bf26ea6d06cbf9dd53969a8/userdata/ctr.log --log-level debug --syslog -t --conmon-pidfile /staging/podman/overlay-containers/39aaf06b80b590134581fdb23972a7fd12e5b71d0bf26ea6d06cbf9dd53969a8/userdata/conmon.pid --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /persistent/edbfea33-1b56-4119-b588-91401d24aced/root/storage --exit-command-arg --runroot --exit-command-arg /staging/podman --exit-command-arg --log-level --exit-command-arg debug --exit-command-arg --cgroup-manager --exit-command-arg cgroupfs --exit-command-arg --tmpdir --exit-command-arg /run/libpod --exit-command-arg --runtime --exit-command-arg runc --exit-command-arg --storage-driver --exit-command-arg overlay --exit-command-arg --storage-opt --exit-command-arg overlay.imagestore=/host/var/lib/containers/storage,/persistent/podman --exit-command-arg --events-backend --exit-command-arg file --exit-command-arg --syslog --exit-command-arg container --exit-command-arg cleanup --exit-command-arg 39aaf06b80b590134581fdb23972a7fd12e5b71d0bf26ea6d06cbf9dd53969a8
root         559  0.0  0.0   9208  1048 pts/1    S+   17:02   0:00 grep conmon

or

hfsim/hypervisor> ps -aef --forest
UID          PID    PPID  C STIME TTY          TIME CMD
root         228       1  0 01:26 ?        00:00:00 /usr/bin/conmon --api-version 1 -c f7449f6a9fd891b5a00c98ba9bfda33d7eef
root         240     228  0 01:26 ?        00:00:00  \_ /sbin/init
root         445     240  0 01:26 ?        00:00:00      \_ /lib/systemd/systemd-journald
root         457     240  0 01:26 ?        00:00:00      \_ /lib/systemd/systemd-udevd
root         463     240  0 01:26 ?        00:00:00      \_ /usr/local/bin/containerd
root         470     240  0 01:26 ?        00:00:00      \_ /usr/sbin/syslogd --no-forward
root         471     240  0 01:26 ?        00:00:00      \_ sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups

Is there any useful information in the journal?

I think I mentioned in one of my questions above that syslog/messages/journal is Not configured on the 'master' OCI container and we need to explicitly setup the --log-path, conmon option, when it is started, I couldn't find a way of doing that from 'podman start' command. Is there any other way?

@giuseppe
Copy link
Member

giuseppe commented Dec 6, 2022

could the privileged container be messing up with the host somehow?

@giuseppe
Copy link
Member

giuseppe commented Dec 6, 2022

otherwise I'm not sure how you could end up in that state

@popescore
Copy link
Author

Giuseppe, so your guesses don't seem to lead to a solution. Do you mind helping debugging this?

  1. I asked about how to setup the logs on conmon (3 times), do you mind helping with that?
  2. Why 'podman ps' thinks the podman container runs after OCI container restarts?
  3. Why 'podman inspect' reports the dated podman container PID when /staging/podman/overlay-containers//userdata/pidfile does not exist after OCI container restarts?
  4. Can you provide a Podman patch to trace (add logs that are enabled by '--log-level=debug') the conditions where podman does Not call comon, so we can identify the exact cause?

@giuseppe
Copy link
Member

  • I asked about how to setup the logs on conmon (3 times), do you mind helping with that?

it is enough to pass --log-level debug to Podman and it will enable logging also for conmon.

  • Why 'podman ps' thinks the podman container runs after OCI container restarts?

probably because conmon didn't write the exit status file for the container

  • Why 'podman inspect' reports the dated podman container PID when /staging/podman/overlay-containers//userdata/pidfile does not exist after OCI container restarts?
  • Can you provide a Podman patch to trace (add logs that are enabled by '--log-level=debug') the conditions where podman does Not call comon, so we can identify the exact cause?

could you ping me on IRC so we can debug it together? I am giuseppe on libera.chat

@popescore
Copy link
Author

popescore commented Dec 12, 2022

I asked about how to setup the logs on conmon (3 times), do you mind helping with that?
it is enough to pass --log-level debug to Podman and it will enable logging also for conmon.

Sorry I wasn't clear, I've noticed that the log-level is passed on to conmon, the problem is that conmon doesn't have the log file set as there is no syslog/journald set up on the master/OCI container and I was looking for a way to set the argument for '--log-path' conmon option.

could you ping me on IRC so we can debug it together? I am giuseppe on libera.chat
It is difficult as we are on different, time zones, not much work-hours overlap. I thought adding debug logs to the source files where conmon is supposed to be started (podman start) would be the first step.

@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Jan 18, 2023

@giuseppe Do you have a response?

@giuseppe
Copy link
Member

I don't know what is going on by the details above, it is the first time I hear about such issue so it is likely a local configuration to cause it.

I've proposed to help on IRC but got no reply to that.

I am closing it since the lack of feedback, please reopen if you've more data or would like me to help debug it as I've offered.

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 4, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 4, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/feature Categorizes issue or PR as related to a new feature. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

No branches or pull requests

3 participants