-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
User Podman Services (podman.service/podman.socket) fail within 24 hrs #10593
Comments
First observation - that Podman release is a little old, there should be something more recent tagged into the Stream 8 repos by now? For debugging, can you do an |
@vrothberg There are logs attached to the BZ. Have you ever seen the following from systemd?
which leads to
|
Also reported in this BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1968210 |
@rhatdan The only thing that I suspect might communicating with the service is cockpit (beyond whatever default services are in a clean install), but I was only logged into the cockpit console briefly when the system rebooted. After that it was sitting idle. @mheon After running If I run
But when I ran that, I noticed there was a lot of text being written to stderr - there were many lines of If I ran
As well as blocks of 'no pwd entry for UID':
There were about 522 entries for Again, not sure if relevant, but seemed odd. |
@iUnknwn If you add |
@jwhonce Yep - that suppresses the errors - thank you. Not sure if the errors (or the number of files that are open in the container) are relevant for the service failures, but figured it was worth including. |
A friendly reminder that this issue had no activity for 30 days. |
@iUnknwn is this still an issue? |
@rhatdan I assume so? But the latest podman in Centos Stream 8 is still 3.1.0-dev (built Mar 26, 2021). Do you think it will behave differently in a later podman version? I'm happy to see if a new package fixes the problem (provided I can install/revert without too much disruption). |
I have no idea if this problem is still an issue. |
Yes, I see this issue a well when Podman socket and services for multiple rootless Podman users is run for a long time(>24hrs). |
@jwhonce PTAL, Looks like we have a leak. |
@rhatdan still an issue - exact same problem here. Additional notes:
OS: RHEL 8.4, AMD, current PS - edit - ADD: Just realised I have cockpit up and running all the time. Since it was mentioned, I will see how I fare without cockpit.... |
As this is the first time that I wrote on a bug, I do not know if I comply with the rules, but it might help. Same problem here (not within 24h) Users using rootless containers with --userns=keep-id flag and local volume mounted. OS: Red Hat Enterprise Linux release 8.4 (Ootpa) $ podman version $ uptime $ systemctl list-units --user --failed Logs: $ lsof -l | grep podman | wc -l There are a lot of (also for tasks until they reach 195 entries): |
Current findings/impressions:
Fix would be kinda cool. Once FD limit is reached, the whole server is basically dead - without admin intervention. |
Ah - I believe @jwhonce is working on FD leaks right now |
@jwhonce and @mheon 8 hours later, no leak... Thats what I did:
Results:
|
@jwhonce and @mheon I took the time and checked closely the past 22 hours. Final statement: no bug from my side anymore on version 3.0.2-dev (RHEL) Reasons:
That being said: the log error message by podman is really irritating. Also for rootless containers it becomes more of an issue than for others. A more comprehensive error message would be cool for any OS limits. Even more cool would be some kind of monitoring for OS limits used/max by podman and per container (podman show limits). Additionally stdout error message, when podman hits OS limits. Example for last part: Just my five cents... |
A friendly reminder that this issue had no activity for 30 days. |
I'm now on Podman version 3.2.3 — the same issue still exists. [Socket] [Install] After 2 days running: Sep 11 23:16:44 comm-guac systemd[1212]: podman.service: Found left-over process 12226 (n/a) in control group while starting unit. Ignoring. $ systemctl --user status podman.socket Sep 12 07:39:04 comm-guac systemd[1212]: Listening on Podman API Socket. There were zero activity for any containers from user side (days off), 3 containers just ran, only one of them depends of socket (portainer). Now portainer can't connect to the socket because of podman.service failure. |
Please retry with 3.3 - a number of fixes from @jwhonce landed in that release, which should alleviate the issue. |
I still don't see v.3.3 in RHEL repository so I must wait till RedHat puts the version in their repo. And I don't want to compile Podman from sources. FYI, I use crun as OCI runtime. Disabled PID limiting for the container because of RHEL 8.4 bug(https://access.redhat.com/solutions/5913671): After reboot all my containers are up. Is this a normal behaviour? I will continue observing the issue and update the case later. |
Yes, that is normal. The service is shutting down when not in use to lower resource use at idle. |
Thank you for the answer. I want to monitor Podman REST API from an external server. As I understand by default REST API available only on localhost. Do you have ready-to-use configuration that allows to publish Podman REST API on host's interface to be able to connect to it from another server? Then I will configure Nginx as a reverse proxy and add an authentication. Thanks in advance. |
The same behaviour: podman.service - Podman API Service Sep 13 04:30:31 comm-guac systemd[1167]: podman.service: Failed to set memory.swap.max: Too many open files ● podman.socket - Podman API Socket Sep 12 10:45:47 comm-guac systemd[1167]: Listening on Podman API Socket. $ ulimit -n It means that it took 14–16 hours to fail. |
A friendly reminder that this issue had no activity for 30 days. |
Since podman 3.4 is released, we believe this is now fixed. |
RHEL8 finally received podman v.3.3.1. I've been testing for the socket during the last 2 days and it seems that the socket works stable. Thank you very much.
|
I still experience the same behavior on RHEL 8.5 with podman 3.3.1. It can be reproduced within hours after server reboot.
|
Please open a fresh issue (or, even better, a Bugzilla) |
Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)
/kind bug
Description
User podman services (podman.socket and podman.service) fail within 24 hours of a system reboot. While user podman containers continue to run, the systemctl log shows both units as failed.
Output from podman.service journal:
Output from podman.socket journal:
Both these issues look similar to previously closed issues (#6093 and #5150) but (unless I'm reading them wrong) fixes for those issues should have been merged a while ago.
Steps to reproduce the issue:
Generate a rootless container (I started 'docker.io/thelounge/thelounge:latest') and create a corresponding user systemctl unit.
Allow to run for 24 hours.
Run
systemctl --user status
- the system will show as degraded. Ifsystemctl list-units --failed
is run, bothpodman.socket
andpodman.service
show as failed.Describe the results you received:
Podman systemd units failed.
Describe the results you expected:
Podman services to continue working normally.
Additional information you deem important (e.g. issue happens only occasionally):
Both appear to be online and working at system start.
Output of
podman version
:Output of
podman info --debug
:Package info (e.g. output of
rpm -q podman
orapt list podman
):Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide?
Checked trouble shooting guide. While not the latest version, it looks like these issues were fixed in podman 1.9.
Additional environment details (AWS, VirtualBox, physical, etc.):
Physical system running Centos Stream 8.
The text was updated successfully, but these errors were encountered: