-
Notifications
You must be signed in to change notification settings - Fork 318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Systemd libpod scope fails after container OOM exit #1138
Comments
@vrothberg @giuseppe PTAL |
experiencing this too. |
I am not able to reproduce on Fedora 37 (systemd-251.10-588.fc37.x86_64), on CentOS (systemd-252-3.el9.x86_64) and on RHEL (systemd-250-12.el9_1.1.x86_64):
If you are using the cpu controller, it means you needed to tweak the default systemd configuration. Could you please share the changes you've done? |
Hey @giuseppe! Appreciate having a look and doing a minimal example. The issue happens only when the container is supervised by Systemd. So to expand on your example: (ran on Centos as ordinary user with SELinux enforcing)
Now to review the error. First the service is failed.
Status in podman
Stop service and try to start manually => Get described error.
I didn't do any CPU modifications and the error also happens without CPU limit set. |
I am not still able to reproduce the issue. When I attempt to use the I don't think the configuration snippet you've provided is going to work.
systemd will monitor the podman command that exits as soon as the container is started. You need at least to use "start -a" so the podman command stays around while the container runs, and even in this case it will cause races with the cleanup process. Even better if you use a template generated by |
No, I'm not using oomd and I haven't made any cgroup changes. I don't really know what that is. The Systemd service is generated by podman. Below the full version. So you are saying podman 4.3.1 generates a wrong Systemd file and is missing the I'll try to reproduce this on a fresh VM now. That has no modification at all. Will report back in 15min roughly.
|
no the default service has no issue because it is using PIDFile and Type=forking. |
I was successful at reproducing this on a fresh VM that doesn't have any customizations. (CentOS 9 on Hetzner cloud) Commands run:
|
I am still not able to see the issue, I've tried on a fully updated CentOS Stream 9 VM (although not freshly installed).
how have you installed it? I'll try with a fresh VM on Monday if it makes any difference |
This is just the image Hetzner provides. They wouldn't do many customizations. I tried 2x to make sure it's reproducible. Our own physical machines are installed from the official ISO. I'll try on a second cloud provider later. |
thanks for the update, I am finally able to reproduce it on a freshly installed CentOS 9 Stream VM. |
Awesome! Excited to learn what you think and if there is a better workaround or potential fix in Podman. |
if the unit could not be started, attempt to reset it first, and then start it again. Closes: containers#1138 Signed-off-by: Giuseppe Scrivano <[email protected]>
let's try to workaround this problem in crun: #1139 |
Good stuff! Will try to compile and test this by Monday at the latest. |
The changes in this branch seem to solve the issue. Thanks for the quick fix and taking the time to reproduce @giuseppe! Tested as follows on CentOS 9:
Then steps from #1138 (comment) As expected the container can still be started after an OOM kill: 👍
Is it working for you as well, @Ember-ruby? |
Issue Description
Since recently, when a rootless container with constrained memory is killed by the kernel due to excess memory usage (OOM), it can't be restarted, due to a failed Systemd
libpod-xxx.scope
unit. The error shown by podman is this:To get the container going again, one needs to reset the libpod scope:
Or add the same in the container's service file:
Steps to reproduce the issue
Steps to reproduce the issue
stress
tool.Describe the results you received
A container that was killed due to OOM can't be restarted automatically or manually.
Describe the results you expected
A container should be restarted automatically after an OOM kill, if restart policy is set to
always
.podman info output
Podman in a container
No
Privileged Or Rootless
Rootless
Upstream Latest Release
Yes
Additional environment details
Container is running as non-priviledged user (rootless) on a physical machine.
Additional information
This worked about 2-3 months ago and could be related to some recent change. Maybe containers/podman#13731.
The text was updated successfully, but these errors were encountered: