Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

systemd sd-notify freezes podman #6688

Closed
goochjj opened this issue Jun 19, 2020 · 55 comments
Closed

systemd sd-notify freezes podman #6688

goochjj opened this issue Jun 19, 2020 · 55 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. stale-issue

Comments

@goochjj
Copy link
Contributor

goochjj commented Jun 19, 2020

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

Creating a service using Type=notify freezes everything until the container is ready.

Setting a service as Type=notify sets the NOTIFY_SOCKET, which gets passed through podman properly. Runc and Crun then proxy that NOTIFY_SOCKET through to the container, indicating the container will signal when ready. The whole idea being that starting a container does not equal "a container is ready", this initialization could take seconds, 10's of seconds, or minutes. And it shouldn't matter how long it takes..

The problem is while it's in "starting" status, podman is frozen completely. podman ps doesn't return, podman exec doesn't work, even podman info won't return. One partially initialized container shouldn't freeze everything, and the lack of exec makes it hard to diagnose what's going on inside the container to resolve the sd-notify issue. podman stop and podman kill appear to work, but the container is still stuck.

In addition, the MAINPID isn't set right - but we'll come back to that.

Steps to reproduce the issue:

  1. Create a systemd service. Change type from Type=forking to Type=notify, and remove the
    PIDFile. Add NotifyAccess=all.
    SSCE
[Unit]
Description=NotifyTesting
Wants=network.target
After=network-online.target

[Service]
Environment=PODMAN_SYSTEMD_UNIT=%n
SyslogIdentifier=%N
ExecStartPre=-/usr/bin/podman stop %N
ExecStartPre=-/usr/bin/podman rm %N
LogExtraFields=CONTAINER_NAME=%N
ExecStart=/usr/bin/podman --log-level=debug run \
  -d --log-driver=journald \
  --init \
  --cgroups no-conmon \
  --net=host \
  --name %N \
  alpine sleep infinity
ExecStop=/usr/bin/podman stop -t 20 %N
Type=notify
NotifyAccess=all
Restart=on-failure
#Restart=always
RestartSec=30s
TimeoutStartSec=20
TimeoutStopSec=25
#KillMode=none
#Type=forking
#PIDFile=/run/podman-pid-%n
Delegate=yes
Slice=machine.slice

[Install]
WantedBy=multi-user.target default.target
  1. Run the service.

  2. Note the status:

notifytest[8595]: time="2020-06-19T13:22:38Z" level=debug msg="Starting container e6043f58bcd610d1e448739f2120447f2880c9b498c65fc3c181e1f453a48ef7 with command

It never gets to started (as expected)

Status:

 Main PID: 12003 (podman)
    Tasks: 22 (limit: 4915)
   Memory: 27.9M
   CGroup: /machine.slice/notifytest.service
           ├─12003 /usr/share/gocode/src/github.com/containers/libpod/bin/podman --log-level=debug run -d --log-driver=journald --init --cgroups no-conmon --net=host --name notifytest alpine sleep infinity
           ├─12065 /usr/libexec/podman/conmon --api-version 1 -c a3e79ea772bdcca69020eca158f059718ff0f4b34dd1c8f8af5e77c6840e60f0 -u a3e79ea772bdcca69020eca158f059718ff0f4b34dd1c8f8af5e77c6840e60f0 -r /usr/bin/runc -b /va>
           └─12084 /usr/bin/runc start a3e79ea772bdcca69020eca158f059718ff0f4b34dd1c8f8af5e77c6840e60f0
  1. Try podman ps, podman exec -it -l sh, or podman info

  2. Freezes.

  3. Release the container
    for i in /run/crun/*/notify/notify /run/runc/*/notify/notify.sock; do env NOTIFY_SOCKET=$i systemd-notify --ready; done

  4. Everything releases, but the service STILL fails, because it didn't find MAINPID=conmon's pid.

Output of podman version:

Version:      2.0.0-dev
API Version:  1
Go Version:   go1.13.3
Git Commit:   b27df834c18b08bb68172fa5bd5fd12a5cd54633
Built:        Thu Jun 18 12:19:01 2020
OS/Arch:      linux/amd64

Output of podman info --debug:

host:
  arch: amd64
  buildahVersion: 1.15.0
  cgroupVersion: v1
  conmon:
    package: Unknown
    path: /usr/libexec/podman/conmon
    version: 'conmon version 2.0.18-dev, commit: 954b05a7908c0aeeff007ebd19ff662e20e5f46f'
  cpus: 4
  distribution:
    distribution: photon
    version: "3.0"
  eventLogger: file
  hostname: photon-machine
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 4.19.115-6.ph3-esx
  linkmode: dynamic
  memFree: 5536968704
  memTotal: 8359960576
  ociRuntime:
    name: runc
    package: runc-1.0.0.rc9-2.ph3.x86_64
    path: /usr/bin/runc
    version: |-
      runc version 1.0.0-rc10+dev
      commit: 2a0466958d9af23af2ad12bd79d06ed0af4091e2
      spec: 1.0.2-dev
  os: linux
  remoteSocket:
    path: /run/podman/podman.sock
  rootless: false
  slirp4netns:
    executable: ""
    package: ""
    version: ""
  swapFree: 0
  swapTotal: 0
  uptime: 25h 9m 27.25s (Approximately 1.04 days)
registries:
  search:
  - docker.io
  - registry.fedoraproject.org
  - registry.access.redhat.com
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 4
    paused: 0
    running: 0
    stopped: 4
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /var/lib/containers/storage
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "true"
  imageStore:
    number: 16
  runRoot: /var/run/containers/storage
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 1
  Built: 1592482741
  BuiltTime: Thu Jun 18 12:19:01 2020
  GitCommit: b27df834c18b08bb68172fa5bd5fd12a5cd54633
  GoVersion: go1.13.3
  OsArch: linux/amd64
  Version: 2.0.0-dev

Package info (e.g. output of rpm -q podman or apt list podman):

Compiled from source.

Additional environment details (AWS, VirtualBox, physical, etc.):

@openshift-ci-robot openshift-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jun 19, 2020
@mheon
Copy link
Member

mheon commented Jun 19, 2020

I'm fairly certain this is because runc/crun themselves basically freeze midway through container creation until they are notified by SD_NOTIFY. Podman is waiting for final confirmation from the runtime that the container has been successfully created, and it holding the container lock for the container in question while doing so. Changing that may be possible (it would unblock some things like podman ps) but the container is not considered running until the runtime finishes up, so we won't be able to podman exec into the container until after the notify issue has sorted itself.

@goochjj
Copy link
Contributor Author

goochjj commented Jun 19, 2020

First fix #6689 - Advertise MAINPID via sd-notify, this eliminates the need for a PIDFile.

In addition, I think there should be a new option to podman run, something like:

--sdnotify conmon-only|container

With conmon-only, we send the MAINPID, and clear the NOTIFY_SOCKET so the OCI runtime doesn't pass it into the container. We could also advertise "ready" when the OCI runtime finishes to advertise the service as ready.

With container, we send the MAINPID, and leave the NOTIFY_SOCKET so the OCI runtime passes it into the container for initialization, and let the container advertise further metadata.

This removes the need for hardcoded CID and PID files in the command line, and the PIDFile directive, as the pid is advertised directly through sd-notify.

@goochjj
Copy link
Contributor Author

goochjj commented Jun 19, 2020

I'm fairly certain this is because runc/crun themselves basically freeze midway through container creation until they are notified by SD_NOTIFY. Podman is waiting for final confirmation from the runtime that the container has been successfully created, and it holding the container lock for the container in question while doing so. Changing that may be possible (it would unblock some things like podman ps) but the container is not considered running until the runtime finishes up, so we won't be able to podman exec into the container until after the notify issue has sorted itself.

I'm less concerned with operations on the container in question (i.e. exec into the actual container) than I am things completely unrelated like podman info. Could always use nsenter to poke around the container and troubleshoot.

@goochjj
Copy link
Contributor Author

goochjj commented Jun 19, 2020

@vrothberg @giuseppe see --sdnotify proposal above

I did MAINPID only since I thought that was least intrusive. With the above, "Container" would be the default as that's what it does now.

We could add "none" as well, to tell podman not to do anything with the NOTIFY_SOCKET. I just didn't see how systemd would get the correct PID from the container itself... Looks like runc attempts to send a MAINPID, and if it overrides us there's nothing we can do about it anyway... but with testing, whatever runc is sending doesn't work.

@vrothberg
Copy link
Member

Thanks for the ping, @goochjj!

@giuseppe knows these details of the runtimes and I guess he will know what to do :)

@giuseppe
Copy link
Member

I'm fairly certain this is because runc/crun themselves basically freeze midway through container creation until they are notified by SD_NOTIFY. Podman is waiting for final confirmation from the runtime that the container has been successfully created, and it holding the container lock for the container in question while doing so. Changing that may be possible (it would unblock some things like podman ps) but the container is not considered running until the runtime finishes up, so we won't be able to podman exec into the container until after the notify issue has sorted itself.

yes I confirm that. Both crun/runc stops on "$RUNTIME start" until the container notifies them.

Why does the container take so long to be ready? Does it actually send any READY=1 notification?

@giuseppe
Copy link
Member

and also, any other info sent by the container is ignored, e.g. MAINPID. The only notification accepted by the runtimes is READY=1

@goochjj
Copy link
Contributor Author

goochjj commented Jun 19, 2020

No, I'm purposely delaying it to test behavior. I don't think we can count on the container init being quick.

For instance, if I were to use this for my tomcat based web apps, it may take 30s-2min to initialize, and then I'd run health checks... I wouldn't want all of podman to be locked while that's happening.

@goochjj
Copy link
Contributor Author

goochjj commented Jun 19, 2020

and also, any other info sent by the container is ignored, e.g. MAINPID. The only notification accepted by the runtimes is READY=1

@giuseppe Am I reading this wrong? It looks like at line 166 runc itself is sending the MAINPID.
https://github.com/opencontainers/runc/blob/9748b48742fad065666bcded00f2db5ded7e44c1/notify_socket.go#L140-L170

@goochjj
Copy link
Contributor Author

goochjj commented Jun 19, 2020

If runc does send it, it's not making it to systemd, or systemd is throwing it out - at least in my testing.

@giuseppe
Copy link
Member

``

@giuseppe Am I reading this wrong? It looks like at line 166 runc itself is sending the MAINPID.
https://github.com/opencontainers/runc/blob/9748b48742fad065666bcded00f2db5ded7e44c1/notify_socket.go#L140-L170

yes correct, runc is sending the MAINPID. The MAINPID sent by runc is the pid for the container

@giuseppe
Copy link
Member

For instance, if I were to use this for my tomcat based web apps, it may take 30s-2min to initialize, and then I'd run health checks... I wouldn't want all of podman to be locked while that's happening.

the assumption is that the container is not ready until it notifies to be so. We should probably fix the healthchecks to raise an error immediately if the container is not running.

@goochjj
Copy link
Contributor Author

goochjj commented Jun 19, 2020

``

@giuseppe Am I reading this wrong? It looks like at line 166 runc itself is sending the MAINPID.
https://github.com/opencontainers/runc/blob/9748b48742fad065666bcded00f2db5ded7e44c1/notify_socket.go#L140-L170

yes correct, runc is sending the MAINPID. The MAINPID sent by runc is the pid for the container

Except it doesn't seem to be....

runc --version
runc version 1.0.0-rc10+dev
commit: 2a0466958d9af23af2ad12bd79d06ed0af4091e2
spec: 1.0.2-dev

One screen: socat unix-recvfrom:/run/nottest,fork -
Another screen:
env NOTIFY_SOCKET=/run/nottest podman --log-level debug run --rm -it alpine sh

Another screen for i in /run/crun/*/notify/notify /run/runc/*/notify/notify.sock; do env NOTIFY_SOCKET=$i systemd-notify --ready; done

All that socat shows is "READY=1"

I don't know what the tinker.C part means, maybe it's a spec option or a command line option?

It appears that systemd DOES accept MAINPID multiple times. Each time MAINPID is sent it switches, throwing a cgroups warning if need be. (runc start is apparently forked off, not surprisingly)

I'm fine sending MAINPID at conmon launch, and again after runc start succeeds to reset it if the OCI runtime played tricks on us.

@goochjj
Copy link
Contributor Author

goochjj commented Jun 19, 2020

For instance, if I were to use this for my tomcat based web apps, it may take 30s-2min to initialize, and then I'd run health checks... I wouldn't want all of podman to be locked while that's happening.

the assumption is that the container is not ready until it notifies to be so. We should probably fix the healthchecks to raise an error immediately if the container is not running.

I 100% agree with that. I also don't care if podman exec -it container_in_question blocks (except that ctrl-c doesn't work - that should probably be fixed). I DO have an issue with podman info blocking, and podman ps. Wasn't the point of this architecture vs. docker that containers were more independent and less susceptible to this sort of coupling between containers...

@mheon
Copy link
Member

mheon commented Jun 19, 2020

We take and hold a lock on the container in question for all major operations. This provides strong safety against races (two podman start on the same container launched at the same time will not be an issue). podman ps also grabs the lock (to get the current status of the container) and it looks like info does the same? Normally this isn't a problem, because every one of the locked operations takes less than a second, and then the lock releases and goes to another process. Unfortunately, the decision by the OCI runtime to not exit until SD_NOTIFY has succeeded means that podman start can hang for a very long time in that case. I'm still of the opinion that this is an OCI runtime issue, but it doesn't seem like one that runc has any intention of addressing.

@goochjj
Copy link
Contributor Author

goochjj commented Jun 23, 2020

@mheon If you don't like how runc handles it, then it could be taken out of the OCI runtime's responsibility.

  1. Have podman mask out NOTIFY_SOCKET from the OCI runtime, but pass it to conmon
  2. Have conmon create a notify socket, intercept READY and pass it on to the host's notify socket, like runc/crun do now.
  3. Have podman inject the bind mount to the socket conmon is proxying, and set NOTIFY_SOCKET properly in the container's env (not the runtime's env)

I imagine that would do the trick... But it seems hacky to implement behavior in podman because the OCI runtime doesn't do what we want.

It depends on what you think should happen - when you do a podman start, and the container isn't READY, has it started? I'd say yes. Because sd-notify is specifically meant to talk to SYSTEMD, not oci runtimes. The container started.

If you want to verify it's READY, you should use systemctl status to verify it's ready. Or, monitor the unix datagram yourself. IMHO the runtime and podman shouldn't block waiting for READY=1.

@mheon
Copy link
Member

mheon commented Jun 23, 2020

That is quite tempting, for more than one reason - we've had occasional problems with the implementation in runc having bugs and upstream not merging fixes for said bugs, so entirely removing this from the runtime lets us do this in a more sensible, nonblocking way and circumvent buggy runtimes.

I am a bit concerned about the complexity this would bring to conmon, and the difficulty of modifying the environment of the running container (going to be difficult doing that in a portable fashion).

@goochjj
Copy link
Contributor Author

goochjj commented Jun 23, 2020

That is quite tempting, for more than one reason - we've had occasional problems with the implementation in runc having bugs and upstream not merging fixes for said bugs, so entirely removing this from the runtime lets us do this in a more sensible, nonblocking way and circumvent buggy runtimes.

I am a bit concerned about the complexity this would bring to conmon, and the difficulty of modifying the environment of the running container (going to be difficult doing that in a portable fashion).

I imagine it would end up in specgen somewhere - injecting into the container's ENV and Mounts. This is very similar to what I was playing with w.r.t. syslog proxying.

https://github.com/containers/conmon/pull/177/files

It was much longer than I thought it should be mainly because it's C and it's glib and it's nonblocking... And everything in conmon so far was based on stream sockets. If conmon gained the ability to do dgram sockets, it could do specific proxying for NOTIFY_SOCKET and generic proxying for things like /dev/log.

Ultimately I replaced it with an OCI hook shell script that was about 20 lines long, which seemed less complex.

I have less understanding of how best to inject mounts and env in libpod - perhaps in specgen somewhere... It would just be another bind mount that libpod handles (/etc/hosts, /etc/resolv.conf, /run/.containerenv, /run/notify) and it would bind to the exact same directory - userdata. It could even always be /run/notify in the container.

@mheon
Copy link
Member

mheon commented Jun 23, 2020

Huh. I was under the impression that the OCI runtime automatically hijacked SD_NOTIFY when it found it - if that's not the case, and we can do it ourselves, that's not bad at all. My main fear was that we'd have to trick the runtime into thinking there was no socket.

@goochjj
Copy link
Contributor Author

goochjj commented Jun 24, 2020

Well, let's test. We all know what happens when runc and crun receives NOTIFY_SOCKET in its own environment. If runc and crun also intercepted the spec's environment... then this wouldn't happen:

podman run --rm -it -e NOTIFY_SOCKET=/tmp/notify alpine sh
/ # printenv NOTIFY_SOCKET
/tmp/notify
/ # ls -l /tmp
total 0

Which means I can do things myself with my own bind mount....

socat unix-recvfrom:/tmp/notify - &
# podman run --rm -it -e NOTIFY_SOCKET=/tmp/notify -v /tmp/notify:/tmp/notify fedora sh
sh-5.0# systemd-notify --ready
READY=1sh-5.0#

That's runc...

Crun:

# podman --runtime /opt/podman/bin/crun run --rm -it -e NOTIFY_SOCKET=/tmp/notify alpine sh
/ # ls -l /tmp
total 0
/ # printenv NOTIFY_SOCKET
/tmp/notify
/ #
# podman --runtime /opt/podman/bin/crun run --rm -it -e NOTIFY_SOCKET=/tmp/notify -v /tmp/notify:/tmp/notify fedora sh
sh-5.0# systemd-notify --ready
READY=1sh-5.0# exit

Also runc and crun only ever pass READY=1, but I don't see any reason why we couldn't also pass STATUS,RELOADING, STOPPING, ERRNO, or WATCHDOG items.. I assume they're not passed now just because they aren't frequently implemented. I think we'd just need to decide which variables conmon owns (MAINPID being the obvious one. conmon could block MAINPID from the container and only ever send its own). Does it make sense for conmon to set up the watchdog for itself? etc.

I can work up a PoC and we can go from there - probably the first task would be to just get the 3 modes I did in the PR working without the OCI runtime doing any proxying, and only dealing with MAINPID and READY=1. Future expansion could build off of that. Doing it that way should solve the podman locking "problem" introduced by the OCI runtime.

@goochjj
Copy link
Contributor Author

goochjj commented Jun 24, 2020

@mheon Given containers/conmon#182, the only changes to libpod are goochjj#1 to handle SD-NOTIFY ourselves

@goochjj
Copy link
Contributor Author

goochjj commented Jun 24, 2020

(assuming the other sd-notify work as a baseline)

@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Jul 25, 2020

@goochjj @mheon @giuseppe Is this still an issue?

@goochjj
Copy link
Contributor Author

goochjj commented Jul 25, 2020

Yes, it is still an issue. The OCI runtime blocks when using sd-notify (container), and podman freezes because the starting container holds the lock.

I've provided a solution.
containers/conmon#182
goochjj#1

@rhatdan
Copy link
Member

rhatdan commented Oct 7, 2020

Any progress on this @goochjj ?

@goochjj
Copy link
Contributor Author

goochjj commented Oct 26, 2020

It's all in #7607 - which was pending conmon's changes making it into the test environments.

@rhatdan
Copy link
Member

rhatdan commented Oct 27, 2020

Any idea if the new conmon is there now?

@rhatdan
Copy link
Member

rhatdan commented Dec 24, 2020

Still waiting for this to get merged.

@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Jan 25, 2021

Still waiting on this one.

@github-actions
Copy link

github-actions bot commented Mar 8, 2021

A friendly reminder that this issue had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Mar 8, 2021

Since Conmon is merged, I am going to assume this is fixed. Reopen if I am mistaken.

@rhatdan rhatdan closed this as completed Mar 8, 2021
@flyingflo
Copy link

Are you sure this is fixed with conmon, only? Don't we need #8508 too?
I'm asking, because I still see this behaviour on F34 with

  • podman 3.2.2
  • conmon 2.0.27
  • crun 0.20.1

I'm trying to get a sd_notify aware service in a podman container to play nicely as a Type=notify systemd unit, preferably with watchdog and status text. I would love this kind of "perfect" integration.

@rhatdan
Copy link
Member

rhatdan commented Jul 16, 2021

Do you have a easy reproducer? We need an example of a program that uses sd-notify that we could try.

@flyingflo
Copy link

I just threw my unpolished experiment into a repo: https://github.com/flyingflo/podman_sdnotify
It also works with a bash script calling systemd-notify, but I used python later, because the PID stays the same.

@rhatdan
Copy link
Member

rhatdan commented Jul 16, 2021

I am not sure how this is supposed to work, but the first example I tried got connection refused, with container, then I changed to conmon type and I get.

× sdnotify_py.service - Podman container-sdnotify_py_ctr.service
     Loaded: loaded (/usr/lib/systemd/system/sdnotify_py.service; disabled; vendor preset: disabled)
     Active: failed (Result: exit-code) since Fri 2021-07-16 08:57:12 EDT; 28s ago
       Docs: man:podman-generate-systemd(1)
    Process: 186726 ExecStartPre=/bin/rm -f /run/container-sdnotify_py_ctr.pid /run/container-sdnotify_py_ctr.ctr-id (code=exited, status=0/SUCCESS)
    Process: 186897 ExecStart=/usr/bin/podman run --conmon-pidfile /run/container-sdnotify_py_ctr.pid --cidfile /run/container-sdnotify_py_ctr.ctr-id --cgroups=split --sdnotify=conmon -d --replace -a stderr -a stdout --log-driver=journal>
    Process: 187055 ExecStopPost=/usr/bin/podman rm --ignore -f --cidfile /run/container-sdnotify_py_ctr.ctr-id (code=exited, status=219/CGROUP)
   Main PID: 186897 (code=exited, status=1/FAILURE)
        CPU: 566ms

Jul 16 08:57:12 localhost.localdomain conmon[186897]:   File "/app/sdnotify_py.py", line 15, in <module>
Jul 16 08:57:12 localhost.localdomain conmon[186897]:     n = sdnotify.SystemdNotifier(debug=True)
Jul 16 08:57:12 localhost.localdomain conmon[186897]:   File "/usr/local/lib/python3.9/site-packages/sdnotify/__init__.py", line 37, in __init__
Jul 16 08:57:12 localhost.localdomain conmon[186897]:     if addr[0] == '@':
Jul 16 08:57:12 localhost.localdomain conmon[186897]: TypeError: 'NoneType' object is not subscriptable
Jul 16 08:57:12 localhost.localdomain podman[186926]: 2021-07-16 08:57:12.67782836 -0400 EDT m=+0.058939985 container died ceee2ce16ccfd607d76ad7eecfcb487ed3cb84a990dbe5f2a04232b49ddf535c (image=localhost/sdnotify_py:latest, name=sdnotif>
Jul 16 08:57:12 localhost.localdomain podman[186926]: 2021-07-16 08:57:12.771301398 -0400 EDT m=+0.152413006 container cleanup ceee2ce16ccfd607d76ad7eecfcb487ed3cb84a990dbe5f2a04232b49ddf535c (image=localhost/sdnotify_py:latest, name=sdn>
Jul 16 08:57:12 localhost.localdomain systemd[1]: sdnotify_py.service: Main process exited, code=exited, status=1/FAILURE
Jul 16 08:57:12 localhost.localdomain systemd[1]: sdnotify_py.service: Control process exited, code=exited, status=219/CGROUP
Jul 16 08:57:12 localhost.localdomain systemd[1]: sdnotify_py.service: Failed with result 'exit-code'.

@rhatdan
Copy link
Member

rhatdan commented Jul 16, 2021

@haircommander Any idea what is going on here. It looks like conmon is not listening on the notify socket?

@haircommander
Copy link
Collaborator

gonna be honest, I do not understand the sd notify code haha

@flyingflo are you able to test using that PR? the phrasing "blocked by oci runtime" in the PR implies there is a missing link that we may need...

@rhatdan
Copy link
Member

rhatdan commented Jul 16, 2021

Actually @flyingflo your example is working for me. Although you python script is failing after the first sdnotify.

# Inform systemd that we've finished our startup sequence...
n = sdnotify.SystemdNotifier(debug=True)
n.notify("READY=1\nSTATUS=Started")

This looks successful, then systemd or conmon drops the connection

count = 1
while True:
    print("Running... {}".format(count))
    try:
        n.notify("STATUS=Count is {}".format(count))

This notify fails, because systemd or conmon has dropped the connection.

Does this test work differently if you don't run it in a container. Is conmon dropping the connection when it should not?

@flyingflo
Copy link

With my test, I can clearly observe the behaviour in this actual issue.
The crun process sticks around (observed with systemctl status) until the python script sends READY after 10s.
During this time podman is blocked completely (e.g. podman ps), which is what this issue is about, if I understood.

The second observation is that all writes to the socket fail, after the first write with READY=1, after which crun terminates.
I believe this is because what I see is crun proxying the socket, not conmon.
The intention of @goochjj in #8508 and containers/conmon#182 is that conmon proxies the socket and that the OCI runtime is circumvented.
I'm not sure if that is correct, because I'm looking into this the first time, but I think we need #8508 before containers/conmon#182 is in effect with podman, because it prepares the environments for conmon and the OCI runtime appropriately.

@mheon
Copy link
Member

mheon commented Jul 16, 2021

Wait - #8508 isn't merged yet?

@vrothberg Didn't we just swap generate systemd to use notify by default? If so, and #8508 is not merged, I think we need an immediate revert of that. The behavior of sdnotify when the OCI runtime is used is not sane.

@flyingflo
Copy link

I just assumed it is not merged because of the #8508 's state, no further investigation.

For me Type=notify works with podman's --sdnotify=container but only for the READY message, because all others don't get through after the runtime goes away, and --sdnotify=conmon works too, but there is no NOTIFY_SOCKET set in the container (I guess that is what it should do).

@rhatdan
Copy link
Member

rhatdan commented Jul 16, 2021

I just updated an pushed it. This PR has been floundering for a while, I guess I will need to pay more attention to it.

@flyingflo
Copy link

Jul 16 08:57:12 localhost.localdomain conmon[186897]: File "/app/sdnotify_py.py", line 15, in
Jul 16 08:57:12 localhost.localdomain conmon[186897]: n = sdnotify.SystemdNotifier(debug=True)
Jul 16 08:57:12 localhost.localdomain conmon[186897]: File "/usr/local/lib/python3.9/site-packages/sdnotify/init.py", line 37, in init
Jul 16 08:57:12 localhost.localdomain conmon[186897]: if addr[0] == '@':
Jul 16 08:57:12 localhost.localdomain conmon[186897]: TypeError: 'NoneType' object is not subscriptable

@rhatdan this is what you get if the env var NOTIFY_SOCKET is not set in the container. Which is the case with --sdnotify=conmon.

@vrothberg
Copy link
Member

Wait - #8508 isn't merged yet?

@vrothberg Didn't we just swap generate systemd to use notify by default? If so, and #8508 is not merged, I think we need an immediate revert of that. The behavior of sdnotify when the OCI runtime is used is not sane.

By default, it's setting --sdnotify=conmon where everything's handled on the Podman-side of things.

@mheon
Copy link
Member

mheon commented Aug 16, 2021

OK Good. We're set then.

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 21, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. stale-issue
Projects
None yet
Development

No branches or pull requests

8 participants