-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improper container state, stale exec sessions and PID wrapping #260
Comments
I’m a lot more concerned with how you’re getting these stale sessions in
the first place. Each exec session should be spawning a cleanup process on
exit, which takes care of updating the state and cleaning up residual files
- I don’t see how those could be killed by timeouts since they’re spawning
immediately after exit. Is something possibly killing the exec session’s
Conmon process, preventing the spawn of the cleanup session? That would
also be a serious issue.
Solving this seems to be a lot easier than rewriting session tracking to
allow us to determine if PIDs wrapped.
…On Wed, May 12, 2021 at 09:17 Martin Polden ***@***.***> wrote:
*Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)*
/kind bug
*Description*
Unable to stop or remove a Podman because of stale exec sessions. The PID
associated with the exec session is reused due to PID wrapping in root PID
namespace, causing Podman to believe the exec session is still alive.
*Steps to reproduce the issue:*
1.
We have a running Podman container.
2.
We execute podman exec periodically in the container, approximately
once
every minute. Our use-case for this is collecting metrics.
3.
When running podman exec, Podman stores an "exec session" containing
metadata (e.g. PID of the spawned process) in its state database.
4.
In some cases we can end up with stale exec sessions, e.g. podman exec
can be killed due to a timeout and is then unable to clean up the exec
session.
5.
The container will then have stale exec sessions (the ExecIDs array in
podman inspect <ctr> grows) in its database, referring to processes
that
are no longer running
6.
When stopping or removing a container, Podman checks if the PIDs of any
exec session still refer to a running process. It retrieves the PIDs
from
its state database in /data/containers/graph/libpod/bolt_state.db
7.
However, the persisted PID may now refer to a different running
process.
This is due to the fact can PIDs wrap around after reaching the limit
specified in /proc/sys/kernel/pid_max.
8.
When the PID persisted in the exec session is reused by another
process,
Podman believes the exec session is still active and the container can
no
longer be stopped or removed.
9.
Max current PID and limit on a system where we're triggering this bug:
$ ps aux | awk '{print $2}' | sort -rn | head -1
$ 399578
$ cat /proc/sys/kernel/pid_max
409600
10.
But why are we reaching the limit so quickly? (We see the "improper
state"
issue every other day)
My guess is that this is due to PID namespaces. A container has its own
PID namespace which maps into the root PID name space, and the number
of
possible PIDs in this namespace is smaller than the total number of
possible PIDs on the host.
The PID in the exec session is the PID in the root name space.
*Describe the results you received:*
Depending on when the PID reuse happens, either stop or rm fails.
podman stop <ctr> fails with:
Error: container 5c5925673e244190340d1af86cb2bb2d9438691e9a48e883d77fedf09d87222a has active exec sessions, refusing to clean up: container state improper
podman rm <ctr> fails with:
Error: cannot remove container 86795917878f6131ca98b45a5e7a87b32fdb9121a4359547b6b007199d115b99 as it has active exec sessions: container state improper
*Describe the results you expected:*
That podman stop and podman rm succeed.
*Additional information you deem important (e.g. issue happens only
occasionally):*
If the container cannot be removed, the only solution is to restart it
first with podman restart <ctr> && podman stop <ctr> && podman rm <ctr>.
*Output of podman version:*
Version: 2.2.1
API Version: 2
Go Version: go1.14.7
Built: Mon Feb 8 21:19:06 2021
OS/Arch: linux/amd64
*Output of podman info --debug:*
host:
arch: amd64
buildahVersion: 1.18.0
cgroupManager: systemd
cgroupVersion: v1
conmon:
package: conmon-2.0.22-3.module+el8.3.1+9857+68fb1526.x86_64
path: /usr/bin/conmon
version: 'conmon version 2.0.22, commit: a40e3092dbe499ea1d85ab339caea023b74829b9'
cpus: 24
distribution:
distribution: '"rhel"'
version: "8.3"
eventLogger: file
hostname: <snip>
idMappings:
gidmap: null
uidmap: null
kernel: 4.18.0-240.15.1.el8_3.x86_64
linkmode: dynamic
memFree: 267927552
memTotal: 25018028032
ociRuntime:
name: runc
package: runc-1.0.0-70.rc92.module+el8.3.1+9857+68fb1526.x86_64
path: /usr/bin/runc
version: 'runc version spec: 1.0.2-dev'
os: linux
remoteSocket:
path: /run/podman/podman.sock
rootless: false
slirp4netns:
executable: ""
package: ""
version: ""
swapFree: 0
swapTotal: 0
uptime: 366h 4m 16.74s (Approximately 15.25 days)
registries:
search:
- registry.access.redhat.com
- registry.redhat.io
- docker.io
store:
configFile: /etc/containers/storage.conf
containerStore:
number: 1
paused: 0
running: 1
stopped: 0
graphDriverName: overlay
graphOptions:
overlay.mountopt: nodev,metacopy=on
graphRoot: /data/containers/graph
graphStatus:
Backing Filesystem: xfs
Native Overlay Diff: "false"
Supports d_type: "true"
Using metacopy: "true"
imageStore:
number: 1
runRoot: /data/containers/run
volumePath: /data/containers/graph/volumes
version:
APIVersion: "2"
Built: 1612819146
BuiltTime: Mon Feb 8 21:19:06 2021
GitCommit: ""
GoVersion: go1.14.7
OsArch: linux/amd64
Version: 2.2.1
*Package info (e.g. output of rpm -q podman or apt list podman):*
podman-2.2.1-7.module+el8.3.1+9857+68fb1526.x86_64
*Have you tested with the latest version of Podman and have you checked
the Podman Troubleshooting Guide?
(https://github.com/containers/podman/blob/master/troubleshooting.md
<https://github.com/containers/podman/blob/master/troubleshooting.md>)*
Yes. (We're running the latest version available in RHEL 8.3)
*Additional environment details (AWS, VirtualBox, physical, etc.):*
Physical.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<https://github.com/containers/podman/issues/10320>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB3AOCBLWPVELAIAVQXXZJDTNJ5XTANCNFSM44YSZQ6Q>
.
|
Yes, we'll look into that. Our host management system calls It's terminated gracefully ( I did a quick test to verify this. I have a container called
My expectation is that there would be a signal handler that does the necessary cleanup on TERM. |
Also, I seem to recall when reading through the code that either |
You’re just killing the Podman process there - the actual exec session has
double forked and is running independently. I suspect you’ll see that there
is still a sleep running in the case you describe, inside the container.
Once an exec session is running, the Podman frontend is just performing
attach work, printing to the terminal and reading input. The actual exec
process and is Conmon parent are independent.
…On Wed, May 12, 2021 at 10:02 Martin Polden ***@***.***> wrote:
Also, I seem to recall when reading through the code that either stop or rm
-f may end up either sending TERM or KILL to "active" exec sessions which
seems dangerous if the PIDs have been reused...
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<https://github.com/containers/podman/issues/10320#issuecomment-839797918>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB3AOCBOFWTRTXAJJCXACS3TNKC6NANCNFSM44YSZQ6Q>
.
|
Right, so the Our integration expects that I'm not entirely clear on the purpose of exec sessions. Do they exist just to keep track of the processes that might need to be killed when the container is stopped/removed? |
The Podman process does not do the cleanup - Conmon spawns a separate
Podman on completion of the exec session (when the first PID in it exits)
which handles cleanup.
The expectation that `podman exec` cleans up the exec process on
termination is definitely not accurate today. I'll need to investigate this
when I can get access to a proper computer again (tomorrow) to see if
Docker does so - if so, this is a Podman bug we'll have to fix.
The exec sessions are an artifact of how the Docker API, which we now
support, works - each session tracks a single exec'd process, providing a
handle by which it can be managed (stopped, inspected, etc). We don't
strictly need them to kill the container (eliminating the first process in
the PID namespace is enough to let the kernel do the rest for us), though I
believe we go through and stop them all first anyways.
…On Wed, May 12, 2021 at 11:23 AM Martin Polden ***@***.***> wrote:
Once an exec session is running, the Podman frontend is just performing
attach work, printing to the terminal and reading input. The actual exec
process and is Conmon parent are independent.
Right, so the podman exec process does the cleanup? This does not seem to
happen if the process receives TERM or INT. I assume this is because the
signal is not forwarded to conmon, so any spawned processes may still be
running.
Our integration expects that podman exec terminates any spawned processes
gracefully on TERM (I think this is what docker exec does), but maybe
calling runc exec directly is better for our use-case.
I'm not entirely clear on the purpose of exec sessions. Do they exist just
to keep track of the processes that might need to be killed when the
container is stopped/removed?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<https://github.com/containers/podman/issues/10320#issuecomment-839862433>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB3AOCEFGDK72EWVY3LK3KTTNKMQXANCNFSM44YSZQ6Q>
.
|
I did some more testing. Start container and an exec session:
Identify and kill
Exec session remains:
|
Please open this as an issue with conmon. |
@haircommander @giuseppe @mrunalp WDYT? |
I don't think Conmon spawns the cleanup process when it's hit by
SIGTERM. This sounds like something we should fix.
…On Fri, May 14, 2021 at 9:09 AM Daniel J Walsh ***@***.***> wrote:
@haircommander @giuseppe @mrunalp WDYT?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Sure, is there a build I can try? |
Thanks! You can download the binary from https://cirrus-ci.com/task/5425328676929536 |
I still get the same behaviour as in #260 (comment) when using the conmon binary from the linked build:
As far as I can tell there is only one signal handler registered with I'm not sure how the Podman cleanup of exec session happens, but I see
|
hm interesting. @mheon shouldn't there be an exit command to cleanup the exec session? |
Hm. We may only be doing it for detached exec sessions. Finally have time to dig into this one, I'll go looking. |
I was right - we are, in fact, only doing exit commands for detached exec. Fortunately, this should be an easy fix. |
containers/podman#10405 should fix that bit. |
Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)
/kind bug
Description
Unable to stop or remove a Podman because of stale exec sessions. The PID associated with the exec session is reused due to PID wrapping in root PID namespace, causing Podman to believe the exec session is still alive.
Steps to reproduce the issue:
We have a running Podman container.
We execute
podman exec
periodically in the container, approximately onceevery minute. Our use-case for this is collecting metrics.
When running
podman exec
, Podman stores an "exec session" containingmetadata (e.g. PID of the spawned process) in its state database.
In some cases we can end up with stale exec sessions, e.g.
podman exec
can be killed due to a timeout and is then unable to clean up the exec
session.
The container will then have stale exec sessions (the
ExecIDs
array inpodman inspect <ctr>
grows) in its database, referring to processes thatare no longer running
When stopping or removing a container, Podman checks if the PIDs of any
exec session still refer to a running process. It retrieves the PIDs from
its state database in
/data/containers/graph/libpod/bolt_state.db
However, the persisted PID may now refer to a different running process.
This is due to the fact can PIDs wrap around after reaching the limit
specified in
/proc/sys/kernel/pid_max
.When the PID persisted in the exec session is reused by another process,
Podman believes the exec session is still active and the container can no
longer be stopped or removed.
Max current PID and limit on a system where we're triggering this bug:
But why are we reaching the limit so quickly? (We see the "improper state"
issue every other day)
My guess is that this is due to PID namespaces. A container has its own
PID namespace which maps into the root PID name space, and the number of
possible PIDs in this namespace is smaller than the total number of
possible PIDs on the host.
The PID in the exec session is the PID in the root name space.
Describe the results you received:
Depending on when the PID reuse happens, either
stop
orrm
fails.podman stop <ctr>
fails with:podman rm <ctr>
fails with:Describe the results you expected:
That
podman stop
andpodman rm
succeed.Additional information you deem important (e.g. issue happens only occasionally):
If the container cannot be removed, the only solution is to restart it first with
podman restart <ctr> && podman stop <ctr> && podman rm <ctr>
.Output of
podman version
:Output of
podman info --debug
:Package info (e.g. output of
rpm -q podman
orapt list podman
):Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/master/troubleshooting.md)
Yes. (We're running the latest version available in RHEL 8.3)
Additional environment details (AWS, VirtualBox, physical, etc.):
Physical.
The text was updated successfully, but these errors were encountered: