-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HealthCmd interval in quadlet not being followed + transient timers not cleaned up #22884
Comments
@mheon PTAL |
To be certain: did this work with Podman 5.0? |
I've used a setup similar to this for a while, yes. I'm not 100% sure when it started occurring but I'm pretty sure it was right after |
The only relevant change I remember going into 5.1 is 4fd8419 So maybe worth to try to revert this and test then |
@lespea Does this reproduce outside of Quadlet? Something like I'm hopeful it doesn't because we have CI that should catch such things |
This reproduces Our CI doesn't check for leaked transient units, the healthchecks are running fine it is just the cleanup which is failing to remove the timer |
Reverting your change makes it work again. |
Problems is this code Lines 282 to 297 in c510959
Your new code only uses one field for the unit name and createTimer overwrites the startup hc with the real hc name so removeTransientFiles then removes the real hc timer and thus leaks the startup hc timer. |
This fixes a regression added in commit 4fd8419, because the name was overwritten by the createTimer() timer call the removeTransientFiles() call removed the new timer and not the startup healthcheck. And then when the container was stopped we leaked it as the wrong unit name was in the state. A new test has been added to ensure the logic works and we never leak the system timers. Fixes containers#22884 Signed-off-by: Paul Holzinger <[email protected]>
This fixes a regression added in commit 4fd8419, because the name was overwritten by the createTimer() timer call the removeTransientFiles() call removed the new timer and not the startup healthcheck. And then when the container was stopped we leaked it as the wrong unit name was in the state. A new test has been added to ensure the logic works and we never leak the system timers. Fixes containers#22884 Signed-off-by: Paul Holzinger <[email protected]>
Ugh sorry this has been an absolute insane week. Really appreciate the fast fix/release and I can confirm that |
This fixes a regression added in commit 4fd8419, because the name was overwritten by the createTimer() timer call the removeTransientFiles() call removed the new timer and not the startup healthcheck. And then when the container was stopped we leaked it as the wrong unit name was in the state. A new test has been added to ensure the logic works and we never leak the system timers. Fixes containers#22884 Signed-off-by: Paul Holzinger <[email protected]> Signed-off-by: tomsweeneyredhat <[email protected]>
Issue Description
With the latest update of podman (
v5.1.0
) I noticed that in my quadlet definitions theHealthInterval
is not being followed but instead theHealthStartupInterval
is. Moreover the transient.timer
files are being left behind whenever the service is stop/restarted causing many error logs to fill be generated since the container is no longer running but the healthcmd continues to be retried (in my case every few seconds for every container).Quadlet def:
Transient logs persisting:
Example of a transient service/timer
Steps to reproduce the issue
Steps to reproduce the issue
HealthStartupCmd
is run,sleep 5
2s
theHealthCmd
is being run,sleep 2
journalctl
every 2 seconds there is an error log for the startup timer/service since those containers no longer exist/var/run/systemd/transient/
to see the old timers/servicesOnUnitInactiveSec=2s
is in the timers which is the interval for the startup health check not the normal oneDescribe the results you received
Timers removed on service reset/stop
Describe the results you expected
Initial health cmd runs it's cmd/interval then once health the normal cmd runs its cmds/interval. Also the checks should be removed whenever the container is restarted/shutdown.
podman info output
Podman in a container
No
Privileged Or Rootless
Privileged
Upstream Latest Release
Yes
Additional environment details
No response
Additional information
No response
The text was updated successfully, but these errors were encountered: