-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
podman stop
returns before removing a container started with --rm
#11384
Comments
Thanks for reaching out, @vlk-charles! Running Podman inside systemd units is full of traps and hard to get right. Can you try using a unit generated via |
|
@vlk-charles use |
@Luap99 Thanks. Didn't know about that. Will try. |
@vlk-charles Note that there is a bug with |
Also,
|
The thing is once the container process exits the conmon process will start the podman cleanup process. If you set I think this decision was wrong. We have no guarantee that the other process is successful, in your case systemd is killing it but it could also fail for other reason. I will see if I can improve this behaviour to make it more robust. The podman ps error is a separate issue. |
Thanks for the insight. So basically I guess the |
I don't think |
When a container is configured for auto removal podman stop should still do cleanup, there is no guarantee the the cleanup process spawned by conmon will be successful. Also a user expects after podman stop that the network/mounts are cleaned up. Therefore podman stop should not return early and instead do the cleanup and ignore errors if the container was already removed. [NO TESTS NEEDED] I don't know how to test this. Fixes containers#11384 Signed-off-by: Paul Holzinger <[email protected]>
I didn't realize clean-up and container removal were separate things. I was mixing them up. |
Thank you for the fix. That should deal with the main aspect of the issue. But I am still not sure whether not waiting for container removal is the right thing. I always used to think of:
as being functionally equivalent to:
But they are actually not. The first one stops the inside process and will now also ensure clean-up (which does not include container removal) but the container will be removed asynchronously, which usually happens later. The latter always blocks until removal. If I may please ask for the developers' insight again, is the consensus that this distinction is intended behavior and desirable? If so, I will adjust. Would this be considered equivalent to the latter case?
I am afraid the @Luap99 Also, should I file a separate issue for the |
The behavior you seem to want could be done here
|
Note that the systemd unit files from podman-generate-systemd do: |
@rhatdan Thank you for the suggestion.
I like the idea of using
Thank you again to everyone. |
Interestingly, after calling |
|
Why did we decide that |
It's not really that hard either - |
Did we decide this, or was it just not considered? |
Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)
/kind bug
Description
After
podman stop
returns after removing a container, there are still processes running in the background responsible for removing the container if it was started with--rm
. In combination with systemd, this can causeiptables
NAT rules to be left behind. This is especially bad when ports are published/forwarded (not demonstrated below), which causes DNAT rules to be added. When these are left behind, the host port is then unusable for future containers because the original rule takes precedence.Steps to reproduce the issue:
Describe the results you received:
Podman failed to remove the
iptables
NAT rules.There is an error message in the log:
The process responsible for the clean-up was presumably killed by systemd after
docker stop
returned.Describe the results you expected:
I expected no error message and the NAT rules to be back to the same state as they were before starting the service.
Additional information you deem important (e.g. issue happens only occasionally):
At first I thought this was related to #11324 but after that one was fixed in Podman 3.3, I determined this is a separate issue. The symptoms are the same in my case.
The manifestation of the bug is a little random because it is based on a race condition. It is reproducible most of the time using the steps above.
Two possible work-arounds are to add either another
ExecStop=sleep 1
orKillMode=none
to the service definition. The first one gives Podman another second before systemd starts killing remaining processes. The latter disables killing altogether but is not a recommended option.Here systemd itself warns about remaining processes and the message about container removal appears only after the service has been stopped.
Output of
podman version
:Output of
podman info --debug
:Package info (e.g. output of
rpm -q podman
orapt list podman
):Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/master/troubleshooting.md)
Yes. Latest Fedora release with latest package from its repository.
Additional environment details (AWS, VirtualBox, physical, etc.):
Virtual machine on VMware ESXi. OS image from https://download.fedoraproject.org/pub/fedora/linux/releases/34/Cloud/x86_64/images/Fedora-Cloud-Base-34-1.2.x86_64.raw.xz .
The text was updated successfully, but these errors were encountered: