Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podman exec -it failures #12423

Closed
slvr32 opened this issue Nov 26, 2021 · 10 comments
Closed

podman exec -it failures #12423

slvr32 opened this issue Nov 26, 2021 · 10 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. stale-issue

Comments

@slvr32
Copy link

slvr32 commented Nov 26, 2021

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

podman exec -it commands fail sporadically

Steps to reproduce the issue:

On a box with several containers, looping through the containers to see this issue...

# for i in podman ps -a | egrep -vi container | awk '{print $1}'; do podman exec -it $i bash -c "ls"; done

Describe the results you received:

Some of the exec -it commands fail, with the following error...

Error: exec failed: container_linux.go:367: starting container process caused: open /dev/ptmx: operation not permitted: OCI permission denied

but sometimes the command is successful

anaconda-post.log dev etc lib logstash mnt proc run srv tmp var
bin entrypoint.sh home lib64 media opt root sbin sys usr

Describe the results you expected:

podman exec -it results without issues

Additional information you deem important (e.g. issue happens only occasionally):

On this particular box, with 9 podman containers, 5 of the exec -it commands fail with the error above, and 4 are successful.

Unfortunately, the issue is sporadic, in that some containers of the same type/version will fail, while other containers of the same type/version will succeed.

Output of podman version:

Version:      3.2.3
API Version:  3.2.3
Go Version:   go1.15.13
Built:        Wed Aug 11 18:53:47 2021
OS/Arch:      linux/amd64

Output of podman info --debug:

host:
  arch: amd64
  buildahVersion: 1.21.3
  cgroupControllers:
  - cpuset
  - cpu
  - cpuacct
  - blkio
  - memory
  - devices
  - freezer
  - net_cls
  - perf_event
  - net_prio
  - hugetlb
  - pids
  - rdma
  cgroupManager: systemd
  cgroupVersion: v1
  conmon:
    package: conmon-2.0.26-3.module+el8.4.0+20195+0a4a4953.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.26, commit: 9ef46ac10f1c8cd2ebbb917f962a154ba3956e63'
  cpus: 72
  distribution:
    distribution: '"ol"'
    version: "8.4"
  eventLogger: file
  hostname: devpodman
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.4.17-2011.1.2.el8uek.x86_64
  linkmode: dynamic
  memFree: 29052325888
  memTotal: 134604869632
  ociRuntime:
    name: runc
    package: runc-1.0.0-73.rc93.module+el8.4.0+20195+0a4a4953.x86_64
    path: /usr/bin/runc
    version: |-
      runc version spec: 1.0.2-dev
      go: go1.15.7
      libseccomp: 2.5.1
  os: linux
  remoteSocket:
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_NET_RAW,CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: ""
    package: ""
    version: ""
  swapFree: 25478819840
  swapTotal: 25769799680
  uptime: 2566h 13m 44.2s (Approximately 106.92 days)
registries:
  search:
  - container-registry.oracle.com
  - docker.io
  - registry.fedoraproject.org
  - quay.io
  - registry.centos.org
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 9
    paused: 0
    running: 9
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /var/lib/containers/storage
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "true"
  imageStore:
    number: 6
  runRoot: /run/containers/storage
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 3.2.3
  Built: 1628708027
  BuiltTime: Wed Aug 11 18:53:47 2021
  GitCommit: ""
  GoVersion: go1.15.13
  OsArch: linux/amd64
  Version: 3.2.3

Package info (e.g. output of rpm -q podman or apt list podman):

podman-3.2.3-0.10.0.1.module+el8.4.0+20289+730b73cc.x86_64

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/master/troubleshooting.md)

No

If I recall correctly, these exec -it failures seem to be a regression w/podman 3.2.3, but I'm not positive this was working reliably with podman 3.0.1 or 3.1.2.

Additional environment details (AWS, VirtualBox, physical, etc.):

Containers are running in an LXC guest, so this is also a 'container within a container' scenario.

@openshift-ci openshift-ci bot added the kind/bug Categorizes issue or PR as related to a bug. label Nov 26, 2021
@slvr32
Copy link
Author

slvr32 commented Nov 26, 2021

exec -i is still useful for scripting commands against containers, but exec -it is valuable for typical (human) interactive shell commands

@mheon
Copy link
Member

mheon commented Nov 29, 2021

@edsantiago Is this an exec flake we've seen before?

@edsantiago
Copy link
Member

@mheon we have rather a lot of exec flakes (#10927, #10825) but cirrus-flake-grep finds no hits for "exec failed:.*ptmx" nor "exec failed:.*operation not permit". I don't keep logs for RHEL flakes, though.

@slvr32
Copy link
Author

slvr32 commented Dec 8, 2021

I just tried updating to podman 3.3.1, and seemed to have luck after I stopped/started the containers with the new podman version, e.g.

Before stopping/starting the containers, but after the update to podman 3.3.1

[root@devpodman ~]# for i in podman ps -a | egrep -vi container | awk '{print $1}'; do podman exec -it $i bash -c ls; done
anaconda-post.log entrypoint.sh lib media proc sbin tmp
bin etc lib64 mnt root srv usr
dev home logstash opt run sys var
anaconda-post.log entrypoint.sh lib media proc sbin tmp
bin etc lib64 mnt root srv usr
dev home logstash opt run sys var
Error: exec failed: container_linux.go:367: starting container process caused: open /dev/ptmx: operation not permitted: OCI permission denied
Error: exec failed: container_linux.go:367: starting container process caused: open /dev/ptmx: operation not permitted: OCI permission denied
Error: exec failed: container_linux.go:367: starting container process caused: open /dev/ptmx: operation not permitted: OCI permission denied
Error: exec failed: container_linux.go:367: starting container process caused: open /dev/ptmx: operation not permitted: OCI permission denied
Error: exec failed: container_linux.go:367: starting container process caused: open /dev/ptmx: operation not permitted: OCI permission denied
anaconda-post.log entrypoint.sh lib media proc sbin tmp
bin etc lib64 mnt root srv usr
dev home logstash opt run sys var
anaconda-post.log entrypoint.sh lib media proc sbin tmp
bin etc lib64 mnt root srv usr
dev home logstash opt run sys var

But then, stopping/starting containers...

[root@devpodman ~]# for i in podman ps -a | egrep -vi container | awk '{print $1}'; do podman stop $i; done
... containers stopped...

[root@devpodman ~]# for i in podman ps -a | egrep -vi container | awk '{print $1}'; do podman start $i; done
... containers started ...

[root@devpodman ~]# for i in podman ps -a | egrep -vi container | awk '{print $1}'; do podman exec -it $i bash -c ls; done
anaconda-post.log entrypoint.sh lib media proc sbin tmp
bin etc lib64 mnt root srv usr
dev home logstash opt run sys var
anaconda-post.log entrypoint.sh lib media proc sbin tmp
bin etc lib64 mnt root srv usr
dev home logstash opt run sys var
anaconda-post.log entrypoint.sh lib media proc sbin tmp
bin etc lib64 mnt root srv usr
dev home logstash opt run sys var
anaconda-post.log entrypoint.sh lib media proc sbin tmp
bin etc lib64 mnt root srv usr
dev home logstash opt run sys var
anaconda-post.log entrypoint.sh lib media proc sbin tmp
bin etc lib64 mnt root srv usr
dev home logstash opt run sys var
anaconda-post.log entrypoint.sh lib media proc sbin tmp
bin etc lib64 mnt root srv usr
dev home logstash opt run sys var
anaconda-post.log etc logstash opt sbin usr
bin home logstash-output-scribe-1.1.0.gem proc srv var
dev lib media root sys
entrypoint.sh lib64 mnt run tmp
anaconda-post.log entrypoint.sh lib media proc sbin tmp
bin etc lib64 mnt root srv usr
dev home logstash opt run sys var
anaconda-post.log entrypoint.sh lib media proc sbin tmp
bin etc lib64 mnt root srv usr
dev home logstash opt run sys var

(no more failures with exec -it)

@slvr32
Copy link
Author

slvr32 commented Dec 8, 2021

On second thought, the issue returns later, but this is the first time I've seen (w/podman 3.3.1) a stop/start of a container provide a quick (albeit temporary?) fix

@giuseppe
Copy link
Member

could you give it a try using crun instead of runc?

Is the issue still reproducible?

@slvr32
Copy link
Author

slvr32 commented Dec 17, 2021

Ok, I switched this LXC host with 9 containers to use crun, restarted podman and all of the containers.

For now, the exec -it command doesn't come back with an error, but I'll have to check periodically to see if that changes, since I noted in my previous comment that restarting the containers would provide a temporary fix w/podman 3.3.1.

I'll update this later to confirm whether things look more stable w/crun as more time passes.

@slvr32
Copy link
Author

slvr32 commented Jan 3, 2022

I've verified that using a crun runtime with podman 3.3.1 seems to resolve the exec -it failures/stability issues.

I took some time to generate some 'exec -it up/down' metrics (every 5 minutes) for a bunch of hosts where podman containers are running, to quantify the difference between runc and crun stability for several days, and almost all of the containers had repeated exec -it failures with runc, whereas the exec -it failures are basically gone with crun.

@slvr32
Copy link
Author

slvr32 commented Jan 6, 2022

A bit off-topic, I also tracked down the runc vs crun issue to the containers-common (noarch, just config) package, where I noticed that there was a fix in Dec. 2020 to not explicitly set a runtime...

containers/common@e50a26f#diff-f38944fdd6507cc37badbde5c66785bdebb77bd6fa2ea0674cd8117f89e95676

but I'm not sure how the github containers-common code maps to containers-common releases.

# rpm -qa | grep containers-common
containers-common-1-2.0.2.module+el8.5.0+20424+d687fed7.noarch

and I verified that the installed rpm still has the hardcoded runtime setting.

I added some puppet code to leverage /etc/containers/containers.conf for any config overrides before podman is installed and containers are created, to work around the initial runc runtime hardcoding from the containers-common package.

I happen to use the forge podman (southalc) module for managing podman containers with puppet, so this seemed like the best solution at the moment for not creating/starting podman containers with the unwanted runc runtime when containers-common is installed as a podman dependency, and then having to change the runtime after the fact and restart the containers.

@github-actions
Copy link

github-actions bot commented Feb 6, 2022

A friendly reminder that this issue had no activity for 30 days.

@rhatdan rhatdan closed this as completed Feb 7, 2022
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 21, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. stale-issue
Projects
None yet
Development

No branches or pull requests

5 participants