simplify podman systemd generate - remove cidfile #13236

grooverdan · 2022-02-14T23:17:40Z

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind feature

Description

podman systemd generate creates an ExecStartPre to remove a cidfile, and ExecStart line that include a cid file, and an ExecStop that uses it, and an ExecStopPost that removes it.

For a Type=notify with the MAINPID of the conman pushed (per comment #12778 (comment) / #9642) all of these usages of cidfile are not needed.

The Type=notify with an accurately communicated pid, and conman acting on all signals to shutdown the container means all the cidfile related directives can be removed.

Steps to reproduce the issue:

bin/podman generate systemd --name --new --template {container}

Describe the results you received:

$ bin/podman generate systemd --name --new --template quizzical_satoshi
# [email protected]
# autogenerated by Podman 4.0.0-dev
# Tue Feb 15 10:04:40 AEDT 2022

[Unit]
Description=Podman container-quizzical_satoshi.service for %I
Documentation=man:podman-generate-systemd(1)
Wants=network-online.target
After=network-online.target
RequiresMountsFor=%t/containers

[Service]
Environment=PODMAN_SYSTEMD_UNIT=%n-%i
Restart=on-failure
TimeoutStopSec=70
ExecStartPre=/bin/rm -f %t/%n.ctr-id
ExecStart=/home/dan/repos/podman/bin/podman run --name=container-quizzical_satoshi-%i --cidfile=%t/%n.ctr-id --cgroups=no-conmon --rm --sdnotify=conmon -d -ti fedora:35
ExecStop=/home/dan/repos/podman/bin/podman stop --ignore --cidfile=%t/%n.ctr-id
ExecStopPost=/home/dan/repos/podman/bin/podman rm -f --ignore --cidfile=%t/%n.ctr-id
Type=notify
NotifyAccess=all

[Install]
WantedBy=default.target

Describe the results you expected:

$ bin/podman generate systemd --name --new --template quizzical_satoshi
# [email protected]
# autogenerated by Podman 4.0.0-dev
# Tue Feb 15 10:04:40 AEDT 2022

[Unit]
Description=Podman container-quizzical_satoshi.service for %I
Documentation=man:podman-generate-systemd(1)
Wants=network-online.target
After=network-online.target
RequiresMountsFor=%t/containers

[Service]
Environment=PODMAN_SYSTEMD_UNIT=%n-%i
Restart=on-failure
TimeoutStopSec=70
ExecStart=/home/dan/repos/podman/bin/podman run --name=container-quizzical_satoshi-%i --cgroups=no-conmon --rm --sdnotify=conmon -d -ti fedora:35
Type=notify
NotifyAccess=all

[Install]
WantedBy=default.target

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

$ bin/podman version
Client:       Podman Engine
Version:      4.0.0-dev
API Version:  4.0.0-dev
Go Version:   go1.16.13
Git Commit:   5977fd509582d6dc8727ce8f8a78011888a1dc17-dirty
Built:        Tue Feb 15 10:03:25 2022
OS/Arch:      linux/amd64

Package info (e.g. output of rpm -q podman or apt list podman):

built from source (today)

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/main/troubleshooting.md)

Yes

Additional environment details (AWS, VirtualBox, physical, etc.):

Discovered investigating @eriksjolund's good use of systemd, podman and socket activation examples eriksjolund/mariadb-podman-socket-activation#1

The text was updated successfully, but these errors were encountered:

grooverdan · 2022-02-15T00:30:10Z

With ExecStop removed TimeoutStopSec=70 is probably not needed either. The default KillMode=control-group should given the container and conman the fair chance to cleanly terminate. A systemd aware entry point can also EXTEND_TIMEOUT_USEC if it so desires.

vrothberg · 2022-02-15T08:48:50Z

Thanks for opening the issue, @grooverdan! Your suggestion sounds good to me. One thing we'd need is the --replace flag in podman run to be sure that name-conflicting containers are removed.

Luap99 · 2022-02-15T09:21:32Z

Please be aware of the consequences. We had this before and it did not work correctly, #11315

There are a number of problems when we do not use podman stop:

we have to set the correct stop signal (can be fixed in the unit)
we need to use the correct stop timeout (can be fixed in the unit)
This is the biggest problem. If conmon exits systemd will think the unit is done and starts killing everything including the cleanup process so we now will leak mounts, network interfaces, etc...

vrothberg · 2022-02-15T09:25:00Z

This is the biggest problem. If conmon exits systemd will think the unit is done and starts killing everything including the cleanup process so we now will leak mounts, network interfaces, etc...

Very fair point, I didn't consider this yet. The Podman clean-up process created by conmon should probably run outside the unit's cgroup.

Luap99 · 2022-02-15T09:26:29Z

Also see #11304 (comment)

grooverdan · 2022-02-15T10:24:35Z

Right so, https://github.com/containers/conmon/blob/e2215a1c4c01c25f2fc1206ad4df012d10374b99/src/ctr_exit.c#L222 as the SIGTERM handler should reap_children? I don't think the sdnotify setting should alter the behaviour here.

So addressing the list:

This should be the handler for all signals that by default terminate(?)
If we wait for the children, the timeout is handled too(?).
SendSIGKILL=no to prevent the undue tear-down of conmon.

vrothberg · 2022-02-15T10:30:15Z

The Podman clean-up process created by conmon should probably run outside the unit's cgroup.

I think this should be enough. @giuseppe WDYT?

Luap99 · 2022-02-15T10:34:02Z

I don't think so, if you have a process that does not respond to sigterm, systemd will wait the timout and then send sigkill to the main pid conmon. This will cause conmon to exit but the container process will keep running AFAICT. The cleanup process will never be started.

We can also not use SendSIGKILL=no because otherwise process will never be terminated if sigterm is ignored
Just test with this unit file:

[Unit]
Documentation=man:podman-generate-systemd(1)
Wants=network-online.target
After=network-online.target
RequiresMountsFor=%t/containers

[Service]
Environment=PODMAN_SYSTEMD_UNIT=%n
Restart=on-failure
TimeoutStopSec=10
ExecStart=podman run --name=testcon --cgroups=no-conmon --rm --sdnotify=conmon -d alpine sleep inf
Type=notify
NotifyAccess=all

[Install]
WantedBy=default.target

vrothberg · 2022-02-15T10:54:11Z

We can also not use SendSIGKILL=no because otherwise process will never be terminated if sigterm is ignored

I concur. We need systemd to be able to nuke when needed.

giuseppe · 2022-02-15T14:34:39Z

The Podman clean-up process created by conmon should probably run outside the unit's cgroup.

I think this should be enough. @giuseppe WDYT?

how can we do that? Create a new systemd scope?

vrothberg · 2022-02-15T15:04:09Z

how can we do that? Create a new systemd scope?

Do you think that could work?

Luap99 · 2022-02-15T15:10:06Z

But this would be to late, no? conmon has to spawn the cleanup process but if conmon is killed with sigkill this is not possible.

giuseppe · 2022-02-15T15:17:41Z

could we move the cleanup to a ExecStopPost action?

vrothberg · 2022-02-15T15:29:52Z

But this would be to late, no? conmon has to spawn the cleanup process but if conmon is killed with sigkill this is not possible.

If things go south, yes.

could we move the cleanup to a ExecStopPost action?

Then we need to communicate the ID of the container somewhere and would need the --cidfile again.

It seems that it's not as straight-forward as I'd wish it could be. In the end, it would "just" be a workaround for systemd. systemd would still reject the mainPID being sent by conmon.

github-actions · 2022-03-18T00:06:06Z

A friendly reminder that this issue had no activity for 30 days.

rhatdan · 2022-03-19T23:53:04Z

@grooverdan @vrothberg What should we do with this issue?

grooverdan · 2022-03-21T01:36:22Z

But this would be to late, no? conmon has to spawn the cleanup process but if conmon is killed with sigkill this is not possible.

If things go south, yes.

Is an early spawn of the cleanup process possible that activates when the parent process dies? I normally see cases of SIGCHILD but not the other way around.

could we move the cleanup to a ExecStopPost action?

Then we need to communicate the ID of the container somewhere and would need the --cidfile again.

It seems that it's not as straight-forward as I'd wish it could be. In the end, it would "just" be a workaround for systemd. systemd would still reject the mainPID being sent by conmon.

Doesn't NotifyAccess=all resolve this? If it does the first conman process that does the cleanup and its child sends mainPID to advertise itself as sacrificial? I don't know how that would work if the container process sends mainPid too.

But if this is all to hard/messy for little gain I guess we can just close this.

vrothberg · 2022-03-21T09:06:48Z

Doesn't NotifyAccess=all resolve this? If it does the first conman process that does the cleanup and its child sends mainPID to advertise itself as sacrificial? I don't know how that would work if the container process sends mainPid too.

I don't think we should change the mainPID for the clean-up process as it will very likely cause hiccups.

The main challenge is to find a way to prevent the clean-up process from being killed by systemd. @msekletar, do you have suggestions? The problem in a nutshell:

We communicate the container ID via a file
The file is then used in StopPost do podman rm the container
We'd love to find an alternative that doesn't need a file
conmon will spawn a podman container cleanup process but we're afraid of the cleanup process being killed by systemd

Is there a way to wait for a child process of conmon until systemd would nuke everything?

giuseppe · 2022-03-21T09:59:44Z

But if this is all to hard/messy for little gain I guess we can just close this.

would the gain just be to not use an external file?

If so, I agree we should probably close this issue since it seems there are no valid alternatives

eriksjolund · 2022-03-27T09:01:34Z

Another mechanism that might be useful is to store information with memfd_create() and sd_pid_notify_with_fds(). I don't know if that mechanism could be helpful for this issue, but I think it's worth mentioning.

I tested it with a minimal C program and saw that file descriptors that were stored from ExecStart, ExecStop and ExecStopPost are all available to the next instance of ExecStart in case the service was explicitly restarted (systemctl --user restart my.service) or if a restart was triggered by Restart=on-failure. See Table 2 in man systemd.service. A failing Watchdog would also trigger a restart. I noticed that sd_listen_fds_with_names() unfortunately returns zero when run from a program that is executed in ExecStop or ExecStopPost (otherwise it could have been a way to pass the container ID to them).

A sketchy idea: In case conmon would like to perform a cleanup, conmon could store its intention (and the container ID) in a memfd_create file and have it stored by systemd. If the service is restarted before the container cleanup has completed, the new ExecStart podman instance could retrieve the stored information (via sd_listen_fds_with_names()) and try to complete the cleanup.

grooverdan · 2022-03-28T06:09:54Z

would the gain just be to not use an external file?

The gain, by moving to Type=notify, is like @eriksjolund's socket activation example shows, is that we get to use containers on services that are aware of systemd, and use its notify/api controls, running both in a container as a systemd service transparently.

msekletar · 2022-04-04T09:11:11Z

I think what @grooverdan suggets in the issue description is doable but requires changes in conmon. Conmon can't exit immediately after starting cleanup process in the container cgroup. Instead it needs to wait for cleanup process to finish and only after that it exists to signal systemd that unit is no longer active. To avoid sending kill signal to other processes running in the cgroup podman should generate unit files with KillMode=mixed.

For bonus points conmon could initiate cleanup process, wait for it and it can even extend originally configured stop timeout using EXTEND_TIMEOUT_USEC= notifications via sd_notify API.

vrothberg · 2022-04-04T11:17:59Z

Thanks for taking a look, @msekletar, and for the chat off GitHub.

I agree that this is the only way. For this to work, we had to update both

conmon to wait for the cleanup process to finish
podman to not nuke conmon during cleanup which probably requires an additional flag to make the behavior conditional

@mheon, what are your thoughts?

mheon · 2022-04-04T13:25:19Z

Podman killing Conmon is a safety measure to ensure that we have a clean slate for restarting the container - Conmon holds the container's ports open, and if it's not gone attempting a container restart via the cleanup process (as happens with containers with restart policy) is not possible. Of course, Conmon could potentially be written to clean up all open FDs instead to clear that conflict, but I somewhat suspect it's not the only one.

Might be easier to add this to conmon-rs, which was always intended to be a longer-living service.

github-actions · 2022-05-06T00:06:31Z

A friendly reminder that this issue had no activity for 30 days.

rhatdan · 2022-05-06T13:19:22Z

@haircommander @saschagrunert WDYT?

github-actions · 2022-06-06T00:07:41Z

A friendly reminder that this issue had no activity for 30 days.

mbach04 · 2022-12-19T01:53:37Z

Bump on this one. Current state of podman generated systemd unit files are problematic in fedora-37, podman 4.3.1

vrothberg · 2022-12-19T08:32:38Z

@mbach04, can you elaborate on what's problematic?

mbach04 · 2022-12-20T03:25:01Z

Fedora 37, podman 4.3.1, clean install
Start a simple pod with a yaml file, podman play kube myfile.yaml
Generate the systemd unit file, move to the appropriate location and the unit file is flagged as bad.

vrothberg · 2022-12-20T09:22:56Z

@mbach04, generate systemd on containers created with kube play is not supported (see the docs). Instead, I recommend using the systemd template for running K8s YAML in systemd.

mbach04 · 2022-12-21T20:20:21Z

If the result of running podman play kube something,yaml where the file defines a pod, and the result is a Pod running in Podman, where is the limitation that results in podman not being able to generate a valid systemd unit file for a running pod?

I'm seeking to understand this as it appears like a very sensible thing to me. What's the delta between what gets created with just podman pod create and podman play kube... ?

To expand on that, I believe podman generate kube is a thing...which would sort of close the loop in the development cycle here when bouncing between Podman and proper Kube.

rhatdan · 2023-07-30T10:12:01Z

We now recommend that users use quadlet for running pods under systemd using kubernetes.YAML. We do not support them directly.

openshift-ci bot added the kind/feature Categorizes issue or PR as related to a new feature. label Feb 14, 2022

grooverdan mentioned this issue Feb 15, 2022

support User= in systemd for running rootless services #12778

Closed

github-actions bot added the stale-issue label Mar 18, 2022

github-actions bot removed the stale-issue label Mar 20, 2022

github-actions bot added the stale-issue label May 6, 2022

rhatdan removed the stale-issue label May 6, 2022

github-actions bot added the stale-issue label Jun 6, 2022

rhatdan removed the stale-issue label Jul 30, 2023

rhatdan closed this as completed Jul 30, 2023

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Oct 29, 2023

github-actions bot locked as resolved and limited conversation to collaborators Oct 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

simplify podman systemd generate - remove cidfile #13236

simplify podman systemd generate - remove cidfile #13236

grooverdan commented Feb 14, 2022

grooverdan commented Feb 15, 2022

vrothberg commented Feb 15, 2022

Luap99 commented Feb 15, 2022 •

edited

Loading

vrothberg commented Feb 15, 2022

Luap99 commented Feb 15, 2022

grooverdan commented Feb 15, 2022

vrothberg commented Feb 15, 2022

Luap99 commented Feb 15, 2022

vrothberg commented Feb 15, 2022

giuseppe commented Feb 15, 2022

vrothberg commented Feb 15, 2022

Luap99 commented Feb 15, 2022

giuseppe commented Feb 15, 2022

vrothberg commented Feb 15, 2022

github-actions bot commented Mar 18, 2022

rhatdan commented Mar 19, 2022

grooverdan commented Mar 21, 2022

vrothberg commented Mar 21, 2022

giuseppe commented Mar 21, 2022

eriksjolund commented Mar 27, 2022

grooverdan commented Mar 28, 2022

msekletar commented Apr 4, 2022

vrothberg commented Apr 4, 2022

mheon commented Apr 4, 2022

github-actions bot commented May 6, 2022

rhatdan commented May 6, 2022

github-actions bot commented Jun 6, 2022

mbach04 commented Dec 19, 2022

vrothberg commented Dec 19, 2022

mbach04 commented Dec 20, 2022

vrothberg commented Dec 20, 2022

mbach04 commented Dec 21, 2022 •

edited

Loading

rhatdan commented Jul 30, 2023

simplify podman systemd generate - remove cidfile #13236

simplify podman systemd generate - remove cidfile #13236

Comments

grooverdan commented Feb 14, 2022

grooverdan commented Feb 15, 2022

vrothberg commented Feb 15, 2022

Luap99 commented Feb 15, 2022 • edited Loading

vrothberg commented Feb 15, 2022

Luap99 commented Feb 15, 2022

grooverdan commented Feb 15, 2022

vrothberg commented Feb 15, 2022

Luap99 commented Feb 15, 2022

vrothberg commented Feb 15, 2022

giuseppe commented Feb 15, 2022

vrothberg commented Feb 15, 2022

Luap99 commented Feb 15, 2022

giuseppe commented Feb 15, 2022

vrothberg commented Feb 15, 2022

github-actions bot commented Mar 18, 2022

rhatdan commented Mar 19, 2022

grooverdan commented Mar 21, 2022

vrothberg commented Mar 21, 2022

giuseppe commented Mar 21, 2022

eriksjolund commented Mar 27, 2022

grooverdan commented Mar 28, 2022

msekletar commented Apr 4, 2022

vrothberg commented Apr 4, 2022

mheon commented Apr 4, 2022

github-actions bot commented May 6, 2022

rhatdan commented May 6, 2022

github-actions bot commented Jun 6, 2022

mbach04 commented Dec 19, 2022

vrothberg commented Dec 19, 2022

mbach04 commented Dec 20, 2022

vrothberg commented Dec 20, 2022

mbach04 commented Dec 21, 2022 • edited Loading

rhatdan commented Jul 30, 2023

Luap99 commented Feb 15, 2022 •

edited

Loading

mbach04 commented Dec 21, 2022 •

edited

Loading