-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
systemd sd-notify freezes podman #6688
Comments
I'm fairly certain this is because |
First fix #6689 - Advertise MAINPID via sd-notify, this eliminates the need for a PIDFile. In addition, I think there should be a new option to podman run, something like:
With conmon-only, we send the MAINPID, and clear the NOTIFY_SOCKET so the OCI runtime doesn't pass it into the container. We could also advertise "ready" when the OCI runtime finishes to advertise the service as ready. With container, we send the MAINPID, and leave the NOTIFY_SOCKET so the OCI runtime passes it into the container for initialization, and let the container advertise further metadata. This removes the need for hardcoded CID and PID files in the command line, and the PIDFile directive, as the pid is advertised directly through sd-notify. |
I'm less concerned with operations on the container in question (i.e. exec into the actual container) than I am things completely unrelated like |
@vrothberg @giuseppe see --sdnotify proposal above I did MAINPID only since I thought that was least intrusive. With the above, "Container" would be the default as that's what it does now. We could add "none" as well, to tell podman not to do anything with the NOTIFY_SOCKET. I just didn't see how systemd would get the correct PID from the container itself... Looks like runc attempts to send a MAINPID, and if it overrides us there's nothing we can do about it anyway... but with testing, whatever runc is sending doesn't work. |
yes I confirm that. Both crun/runc stops on "$RUNTIME start" until the container notifies them. Why does the container take so long to be ready? Does it actually send any |
and also, any other info sent by the container is ignored, e.g. |
No, I'm purposely delaying it to test behavior. I don't think we can count on the container init being quick. For instance, if I were to use this for my tomcat based web apps, it may take 30s-2min to initialize, and then I'd run health checks... I wouldn't want all of podman to be locked while that's happening. |
@giuseppe Am I reading this wrong? It looks like at line 166 runc itself is sending the MAINPID. |
If runc does send it, it's not making it to systemd, or systemd is throwing it out - at least in my testing. |
yes correct, runc is sending the MAINPID. The MAINPID sent by runc is the pid for the container |
the assumption is that the container is not ready until it notifies to be so. We should probably fix the healthchecks to raise an error immediately if the container is not running. |
Except it doesn't seem to be....
One screen: Another screen All that socat shows is "READY=1" I don't know what the tinker.C part means, maybe it's a spec option or a command line option? It appears that systemd DOES accept MAINPID multiple times. Each time MAINPID is sent it switches, throwing a cgroups warning if need be. (runc start is apparently forked off, not surprisingly) I'm fine sending MAINPID at conmon launch, and again after runc start succeeds to reset it if the OCI runtime played tricks on us. |
I 100% agree with that. I also don't care if podman exec -it container_in_question blocks (except that ctrl-c doesn't work - that should probably be fixed). I DO have an issue with podman info blocking, and podman ps. Wasn't the point of this architecture vs. docker that containers were more independent and less susceptible to this sort of coupling between containers... |
We take and hold a lock on the container in question for all major operations. This provides strong safety against races (two |
@mheon If you don't like how runc handles it, then it could be taken out of the OCI runtime's responsibility.
I imagine that would do the trick... But it seems hacky to implement behavior in podman because the OCI runtime doesn't do what we want. It depends on what you think should happen - when you do a podman start, and the container isn't READY, has it started? I'd say yes. Because sd-notify is specifically meant to talk to SYSTEMD, not oci runtimes. The container started. If you want to verify it's READY, you should use systemctl status to verify it's ready. Or, monitor the unix datagram yourself. IMHO the runtime and podman shouldn't block waiting for READY=1. |
That is quite tempting, for more than one reason - we've had occasional problems with the implementation in I am a bit concerned about the complexity this would bring to conmon, and the difficulty of modifying the environment of the running container (going to be difficult doing that in a portable fashion). |
I imagine it would end up in specgen somewhere - injecting into the container's ENV and Mounts. This is very similar to what I was playing with w.r.t. syslog proxying. https://github.com/containers/conmon/pull/177/files It was much longer than I thought it should be mainly because it's C and it's glib and it's nonblocking... And everything in conmon so far was based on stream sockets. If conmon gained the ability to do dgram sockets, it could do specific proxying for NOTIFY_SOCKET and generic proxying for things like /dev/log. Ultimately I replaced it with an OCI hook shell script that was about 20 lines long, which seemed less complex. I have less understanding of how best to inject mounts and env in libpod - perhaps in specgen somewhere... It would just be another bind mount that libpod handles (/etc/hosts, /etc/resolv.conf, /run/.containerenv, /run/notify) and it would bind to the exact same directory - userdata. It could even always be /run/notify in the container. |
Huh. I was under the impression that the OCI runtime automatically hijacked SD_NOTIFY when it found it - if that's not the case, and we can do it ourselves, that's not bad at all. My main fear was that we'd have to trick the runtime into thinking there was no socket. |
Well, let's test. We all know what happens when runc and crun receives NOTIFY_SOCKET in its own environment. If runc and crun also intercepted the spec's environment... then this wouldn't happen:
Which means I can do things myself with my own bind mount....
That's runc... Crun:
Also runc and crun only ever pass READY=1, but I don't see any reason why we couldn't also pass STATUS,RELOADING, STOPPING, ERRNO, or WATCHDOG items.. I assume they're not passed now just because they aren't frequently implemented. I think we'd just need to decide which variables conmon owns (MAINPID being the obvious one. conmon could block MAINPID from the container and only ever send its own). Does it make sense for conmon to set up the watchdog for itself? etc. I can work up a PoC and we can go from there - probably the first task would be to just get the 3 modes I did in the PR working without the OCI runtime doing any proxying, and only dealing with MAINPID and READY=1. Future expansion could build off of that. Doing it that way should solve the podman locking "problem" introduced by the OCI runtime. |
@mheon Given containers/conmon#182, the only changes to libpod are goochjj#1 to handle SD-NOTIFY ourselves |
(assuming the other sd-notify work as a baseline) |
A friendly reminder that this issue had no activity for 30 days. |
Yes, it is still an issue. The OCI runtime blocks when using sd-notify (container), and podman freezes because the starting container holds the lock. I've provided a solution. |
Any progress on this @goochjj ? |
It's all in #7607 - which was pending conmon's changes making it into the test environments. |
Any idea if the new conmon is there now? |
Still waiting for this to get merged. |
A friendly reminder that this issue had no activity for 30 days. |
Still waiting on this one. |
A friendly reminder that this issue had no activity for 30 days. |
Since Conmon is merged, I am going to assume this is fixed. Reopen if I am mistaken. |
Are you sure this is fixed with conmon, only? Don't we need #8508 too?
I'm trying to get a sd_notify aware service in a podman container to play nicely as a |
Do you have a easy reproducer? We need an example of a program that uses sd-notify that we could try. |
I just threw my unpolished experiment into a repo: https://github.com/flyingflo/podman_sdnotify |
I am not sure how this is supposed to work, but the first example I tried got connection refused, with container, then I changed to conmon type and I get.
|
@haircommander Any idea what is going on here. It looks like conmon is not listening on the notify socket? |
gonna be honest, I do not understand the sd notify code haha @flyingflo are you able to test using that PR? the phrasing "blocked by oci runtime" in the PR implies there is a missing link that we may need... |
Actually @flyingflo your example is working for me. Although you python script is failing after the first sdnotify.
This looks successful, then systemd or conmon drops the connection
This notify fails, because systemd or conmon has dropped the connection. Does this test work differently if you don't run it in a container. Is conmon dropping the connection when it should not? |
With my test, I can clearly observe the behaviour in this actual issue. The second observation is that all writes to the socket fail, after the first write with |
Wait - #8508 isn't merged yet? @vrothberg Didn't we just swap |
I just assumed it is not merged because of the #8508 's state, no further investigation. For me Type=notify works with podman's |
I just updated an pushed it. This PR has been floundering for a while, I guess I will need to pay more attention to it. |
@rhatdan this is what you get if the env var NOTIFY_SOCKET is not set in the container. Which is the case with |
By default, it's setting --sdnotify=conmon where everything's handled on the Podman-side of things. |
OK Good. We're set then. |
Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)
/kind bug
Description
Creating a service using Type=notify freezes everything until the container is ready.
Setting a service as Type=notify sets the NOTIFY_SOCKET, which gets passed through podman properly. Runc and Crun then proxy that NOTIFY_SOCKET through to the container, indicating the container will signal when ready. The whole idea being that starting a container does not equal "a container is ready", this initialization could take seconds, 10's of seconds, or minutes. And it shouldn't matter how long it takes..
The problem is while it's in "starting" status, podman is frozen completely.
podman ps
doesn't return,podman exec
doesn't work, evenpodman info
won't return. One partially initialized container shouldn't freeze everything, and the lack of exec makes it hard to diagnose what's going on inside the container to resolve the sd-notify issue.podman stop
andpodman kill
appear to work, but the container is still stuck.In addition, the MAINPID isn't set right - but we'll come back to that.
Steps to reproduce the issue:
PIDFile. Add NotifyAccess=all.
SSCE
Run the service.
Note the status:
It never gets to started (as expected)
Status:
Try podman ps, podman exec -it -l sh, or podman info
Freezes.
Release the container
for i in /run/crun/*/notify/notify /run/runc/*/notify/notify.sock; do env NOTIFY_SOCKET=$i systemd-notify --ready; done
Everything releases, but the service STILL fails, because it didn't find MAINPID=conmon's pid.
Output of
podman version
:Output of
podman info --debug
:Package info (e.g. output of
rpm -q podman
orapt list podman
):Compiled from source.
Additional environment details (AWS, VirtualBox, physical, etc.):
The text was updated successfully, but these errors were encountered: