cmd/run: Ensure underlying container is stopped when toolbox is killed #1207

halfline · 2022-12-22T01:51:58Z

Right now "toolbox enter" creates a container on the fly, but then lets it linger after the foreground toolbox process is killed (for instance, from a terminal hangup).

Not killing the underlying container has the negative side effect of stalling shutdown if a toolbox shell is running.

This commit addresses that problem by detecting when the toolbox process is signaled, and then in response, kills off the entire cgroup associated with the underlying container.

Closes #1157

halfline · 2022-12-22T01:55:18Z

@matthiasclasen i think this is probably a fix for the shutdown issue you told me about yesterday

softwarefactory-project-zuul · 2022-12-22T02:29:51Z

Build failed.

❌ unit-test FAILURE in 24m 23s
❌ unit-test-migration-path-for-coreos-toolbox FAILURE in 3m 12s
✔️ system-test-fedora-rawhide SUCCESS in 34m 09s
✔️ system-test-fedora-36 SUCCESS in 12m 45s
✔️ system-test-fedora-35 SUCCESS in 14m 19s

Right now "toolbox enter" creates a container on the fly, but then lets it linger after the foreground toolbox process is killed (for instance, from a terminal hangup). Not killing the underlying container has the negative side effect of stalling shutdown if a toolbox shell is running. This commit addresses that problem by detecting when the toolbox process is signaled, and then in response, kills off the entire cgroup associated with the underlying container. Closes containers#1157 Signed-off-by: Ray Strode <[email protected]>

softwarefactory-project-zuul · 2022-12-22T03:53:53Z

Build succeeded.

✔️ unit-test SUCCESS in 27m 38s
✔️ unit-test-migration-path-for-coreos-toolbox SUCCESS in 3m 09s
✔️ system-test-fedora-rawhide SUCCESS in 38m 38s
✔️ system-test-fedora-36 SUCCESS in 12m 23s
✔️ system-test-fedora-35 SUCCESS in 13m 53s

debarshiray · 2023-09-30T12:56:52Z

Just for the sake of posterity ...

@matthiasclasen i think this is probably a fix for the shutdown issue you told me about yesterday

I think the blocked shutdown problem was reported as:

... and might have been fixed by:
containers/podman#17025

Note that while containers/podman#14531 points at containers/podman#16785 , I don't think it had anything to do with fixing blocked shutdowns caused by lingering podman exec sessions.

debarshiray

Thanks for working on this @halfline - my apologies for not looking at it sooner.

I don't think the changes in this pull request are working as expected, and the commit message suggests that there might be some misunderstanding.

Longer explanation follows ...

First, toolbox enter doesn't quite create a container on the fly. It is confusing because there's some syntactic sugar where toolbox create will offer to create a container if you had none. You could think of it as an embedded toolbox create.

The canonical flow is that a container is created with toolbox create, which wraps podman create. Then that container is entered with toolbox enter, which wraps podman start and podman exec. The podman start launches the container's entry point process, which would have been PID 1 inside the container had we been using a PID namespace, but we don't. For Toolbx containers, this entry point is toolbox init-container. The podman exec launches the process that the user actually interacts with. eg., a CLI shell, Emacs, Vim, etc..

The same container can be entered multiple times from multiple terminals at the same time. When that happens the podman start is basically a NOP because a container can have only one entry point and it's already running.

Neither the entry point nor the foreground processes are parented by podman start or podman exec. Each of them are parented by a separate conmon(8) process which is double forked from podman(1) and parented by the systemd user instance.

The blocked shutdown problem was that when SIGTERM was sent to the container's entry point, either through podman stop or the shutdown sequence, the processes launched by podman exec wouldn't be terminated. This would end up blocking shutdown. My understanding is that this was caused by Toolbx containers not having a separate PID namespace and was fixed by containers/podman#17025

This uncovers another problem. How did those processes launched by podman exec manage to outlive their respective terminals? This is #1204 I suspect this happens because the signal sent by the terminal only reaches upto the wrapper podman exec and doesn't get to the corresponding conmon(8) process.

With the changes in this branch, if I have multiple graphical terminals open with toolbox enter against the same container, then closing any one of those terminals with the cross button terminates the container's entry point and the foreground container process in all the other terminals. This shouldn't happen. Closing a terminal should only terminate its own foreground container process.

I don't understand cgroups very well, so I don't know why this is happening.

Toolbx containers also use the host's cgroup namespace, and, as far as I know, each terminal created by GNOME Terminal has its own cgroup. For each of the foreground container processes, I get:

$ cat /proc/56151/cgroup 
0::/user.slice/user-1000.slice/[email protected]/app.slice/app-org.gnome.Terminal.slice/vte-spawn-2b4111a9-f4a9-4f2b-8acd-4adc9e673bac.scope
$ 
$ cat /proc/55919/cgroup 
0::/user.slice/user-1000.slice/[email protected]/app.slice/app-org.gnome.Terminal.slice/vte-spawn-9bf0516d-bd6f-4094-9504-0be37b1ce6bf.scope
$ 
$ cat /proc/56363/cgroup 
0::/user.slice/user-1000.slice/[email protected]/app.slice/app-org.gnome.Terminal.slice/vte-spawn-6ecb9e7e-aa06-42d6-b9b6-bb9d6cdb44f4.scope

So, I have no idea why terminating the processes in one cgroup is affecting the others.

halfline · 2023-10-01T14:13:07Z

sorry it's been a while since i looked at this and even when I was working on this I only put it together in a few hours after Matthias visited and he mentioned the problem off hand, so I may be off base here... I haven't done much deep diving and I may have made wrong assumptions.

Having said that, my impression from your message is you believe I'm sending the hangup signal to the terminal's cgroup, but that's not the intent. The idea was to send the signal to the cgroup of the bash running in the container.

I got (at least) two things wrong I think:

I had assumed each conmon would create a new cgroup for each podman-exec invocation. It seems that assumption was perhaps wrong. Indeed, every shell seems to be using a shared cgroup. I get:

╎❯ cat /proc/self/cgroup
0::/user.slice/user-4153.slice/[email protected]/user.slice/libpod-4e141e468f1a756b43407f64f0273b5c18fcf5b784ea2a13050d0a9f4e1528ab.scope/container

in all my terminals. I find this really surprising. I do see crun has a --cgroup option, but indeed looking in ps output, it's not getting used.

I looking at the wrong pid to find the cgroup. I'm erroneously using the pid of the toolbox init-container process to find the cgroup.

This is #1204 I suspect this happens because the signal sent by the terminal only reaches upto the wrapper podman exec and doesn't get to the corresponding conmon(8) process

Right, when a terminal is closed, a SIGHUP is sent to all processes in the terminal "session" (in the setsid() sense of the word). These sessions are inherited from their parents, unless a child creates a fresh session. There's no way in unix afaik to move a child into a pre-existing session; it either uses the one it started with, or it starts its own. conmon isn't part of the child hierarchy of podman-exec, so it won't inherit the terminal session. This means the hangup signal needs to be ferried either from podman-exec or toolbox to the process started by conmon (or if children had their own cgroup like I thought, we could just send the signal to the whole cgroup)

debarshiray · 2023-10-03T16:35:18Z

Having said that, my impression from your message is you believe I'm sending the hangup signal to the terminal's cgroup, but that's not the intent. The idea was to send the signal to the cgroup of the bash running in the container.

I managed to confuse myself as well.

GNOME Terminal puts every VteTerminal into a separate systemd scope or cgroup. That's what I was showing here, because these process IDs are of the toolbox enter processes, and not of conmon(8) or the processes inside the container:

$ cat /proc/56151/cgroup 
0::/user.slice/user-1000.slice/[email protected]/app.slice/app-org.gnome.Terminal.slice/vte-spawn-2b4111a9-f4a9-4f2b-8acd-4adc9e673bac.scope
$ 
$ cat /proc/55919/cgroup 
0::/user.slice/user-1000.slice/[email protected]/app.slice/app-org.gnome.Terminal.slice/vte-spawn-9bf0516d-bd6f-4094-9504-0be37b1ce6bf.scope
$ 
$ cat /proc/56363/cgroup 
0::/user.slice/user-1000.slice/[email protected]/app.slice/app-org.gnome.Terminal.slice/vte-spawn-6ecb9e7e-aa06-42d6-b9b6-bb9d6cdb44f4.scope

That's why I didn't realize that all the conmon(8) processes for the same container share the same cgroup. However, they don't share it across different containers.

I had assumed each conmon would create a new cgroup for each podman-exec invocation. It seems that assumption was perhaps wrong. Indeed, every shell seems to be using a shared cgroup. I get:

╎❯ cat /proc/self/cgroup 0::/user.slice/user-4153.slice/[email protected]/user.slice/libpod-4e141e468f1a756b43407f64f0273b5c18fcf5b784ea2a13050d0a9f4e1528ab.scope/container

in all my terminals. I find this really surprising. I do see crun has a --cgroup option, but indeed looking in ps output, it's not getting used.

Right. Thanks for the hint, because I had missed it before.

I looking at the wrong pid to find the cgroup. I'm erroneously using the pid of the toolbox init-container process to find the cgroup.

nod

This is #1204 I suspect this happens because the signal sent by the terminal only reaches upto the wrapper podman exec and doesn't get to the corresponding conmon(8) process

Right, when a terminal is closed, a SIGHUP is sent to all processes in the terminal "session" (in the setsid() sense of the word). These sessions are inherited from their parents, unless a child creates a fresh session. There's no way in unix afaik to move a child into a pre-existing session; it either uses the one it started with, or it starts its own. conmon isn't part of the child hierarchy of podman-exec, so it won't inherit the terminal session. This means the hangup signal needs to be ferried either from podman-exec or toolbox to the process started by conmon (or if children had their own cgroup like I thought, we could just send the signal to the whole cgroup)

nod

I don't see an obvious way to get the PID of the conmon(8) process or the process inside the container started by it. That was my question in #1204

Otherwise, we can insert a process that we control between conmon(8) and the user-requested process inside the container that will relay the PID inside the container back to the toolbox enter process. Then we can ferry the SIGHUP from the terminal to the container process. I don't know what would be the best way to do it. Maybe we can send a file descriptor?

This intermediate process under our control sounds a lot like the toolbox shell idea that I had before. We don't have to call it toolbox shell, though. It's just that I am really bad at naming. So, suggestions welcome.

halfline · 2023-10-03T17:35:23Z

I think it probably is possible to get the right pid. just as an experiment:

╎❯ bundle_path=$(podman inspect 4e141e468f1a756b43407f64f0273b5c18fcf5b784ea2a13050d0a9f4e1528a | jq -r '.[0].StaticDir')
╎❯ exec_id=$(podman exec -d 4e141e468f1a756b43407f64f0273b5c18fcf5b784ea2a13050d0a9f4e1528a sleep 300)
╎❯ pid=$(podman exec 4e141e468f1a756b43407f64f0273b5c18fcf5b784ea2a13050d0a9f4e1528a cat $bundle_path/$exec_id/exec_pid)
╎❯ echo $pid
193097

There may be a more direct command for it. not sure. libpod has an api for it:

// Get PID file path for a container's exec session
func (c *Container) execPidPath(sessionID string) string {
        return filepath.Join(c.execBundlePath(sessionID), "exec_pid")
}

though it looks like toolbx doesn't use libpod ?

halfline · 2023-10-03T17:49:03Z

I guess the complication is the exec_id is only printed for background exec invocations. You can fish it out of podman inspect output, but you would have to diff the list of execids before and after to find the right one, which is a little racy.

debarshiray · 2023-10-03T18:48:46Z

I think it probably is possible to get the right pid. just as an experiment:

Cool. I didn't know that exec IDs are a thing. I think I never tried podman exec --detach until now.

though it looks like toolbx doesn't use libpod ?

When we rewrote Toolbx in Go, we wanted to use libpod and the other backend Go packages directly instead of going through podman(1). However, the Podman folks discouraged us from doing that, so we didn't.

These days, for other reasons, I am considering implementing some small things directly inside toolbox(1) to avoid going through podman(1), where we can rely on a stable interface. I have no idea if this is such a case.

If we can't get the PID inside the container through a Podman interface, do you think the alternative of inserting a shim is sane?

debarshiray · 2023-10-03T19:01:06Z

podman exec --interactive --tty, which is what toolbox enter does, creates a nested pseudo-terminal device and wires it up to the terminal device that the user is interacting with. If the outer terminal goes away, then shouldn't it break the connection and somehow trigger the inner terminal?

halfline · 2023-10-03T20:43:59Z

podman exec --interactive --tty, which is what toolbox enter does, creates a nested pseudo-terminal device and wires it up to the terminal device that the user is interacting with. If the outer terminal goes away, then shouldn't it break the connection and somehow trigger the inner terminal?

Interesting. Right, when the terminal is closed, podman exec would be receiving a hang up signal (SIGHUP) except:

╎❯ echo $$
203855

╎❯ ps -eocomm,pid,ppid,tpgid,sess,pgrp,tty |grep -E 'podman|bash|toolbox|conmon|COMM'
COMMAND             PID    PPID   TPGID    SESS    PGRP TT
bash             203855    3069  207753  203855  203855 pts/18


╎❯ toolbox enter f39
╎❯ ps -eocomm,pid,ppid,tpgid,sess,pgrp,tty |grep -E 'podman|bash|toolbox|conmon|COMMAND|pts.18'
COMMAND             PID    PPID   TPGID    SESS    PGRP TT
bash             203855    3069  207768  203855  203855 ?
toolbox          207768  203855  207768  203855  207768 ?
podman           207888  207768  207768  203855  207768 ?
conmon           207903    1899      -1  207903  207903 ?
bash             207909  207903  207947  207909  207909 pts/3

You can see podman exec is in the toolbox process group and toolbox doesn't have a controlling tty. Neither conmon. In fact, nothing has /dev/pts/18 as the controlling tty anymore. This means when the terminal is closed, the hangup signal doesn't get sent. i guess maybe something became the foreground process group and then exited? or maybe there was a TIOCNOTTY ioctl?

regardless whatever is splicing input between the two ttys (podman? conmon?), should be getting a hang up in poll so it should know to tear things down i guess.

halfline · 2023-10-05T00:15:56Z

just browsing around a little, i think the splicing happens here:

func setupStdioChannels(streams *define.AttachStreams, conn *net.UnixConn, detachKeys []byte) (chan error, chan error) {
        receiveStdoutError := make(chan error)
        go func() {
                receiveStdoutError <- redirectResponseToOutputStreams(streams.OutputStream, streams.ErrorStream, streams.AttachOutput, streams.AttachError, conn)
        }()

        stdinDone := make(chan error)
        go func() {
                var err error
                if streams.AttachInput {
                        _, err = util.CopyDetachable(conn, streams.InputStream, detachKeys)
                }
                stdinDone <- err
        }()

        return receiveStdoutError, stdinDone
}

So maybe something like this would fix it
(untested, I don't know go, etc etc)
(EDIT: changed idea a bit to not slurp up the stdinDone before it can be processed by readStdio)
(EDIT2: changed readStdio to be a method so it has access to the container object, still completely untested)

diff --git a/libpod/oci_conmon_attach_common.go b/libpod/oci_conmon_attach_common.go
index 8d5a7f8..fa1dbad 100644
--- a/libpod/oci_conmon_attach_common.go
+++ b/libpod/oci_conmon_attach_common.go
@@ -101,5 +101,5 @@ func (r *ConmonOCIRuntime) Attach(c *Container, params *AttachOptions) error {
                params.AttachReady <- true
        }
-       return readStdio(conn, params.Streams, receiveStdoutError, stdinDone)
+       return c.readStdio(conn, params.Streams, receiveStdoutError, stdinDone)
 }
 
@@ -193,5 +193,5 @@ func (c *Container) attachToExec(streams *define.AttachStreams, keys *string, se
        }
 
-       return readStdio(conn, streams, receiveStdoutError, stdinDone)
+       return c.readStdio(conn, streams, receiveStdoutError, stdinDone)
 }
 
@@ -288,5 +288,5 @@ func redirectResponseToOutputStreams(outputStream, errorStream io.Writer, writeO
 }
 
-func readStdio(conn *net.UnixConn, streams *define.AttachStreams, receiveStdoutError, stdinDone chan error) error {
+func (c *Container) readStdio(conn *net.UnixConn, streams *define.AttachStreams, receiveStdoutError, stdinDone chan error) error {
        var err error
        select {
@@ -309,4 +309,7 @@ func readStdio(conn *net.UnixConn, streams *define.AttachStreams, receiveStdoutE
                        }
                }
+               if streams.AttachInput {
+                       c.Kill(uint(syscall.SIGHUP))
+               }
                if streams.AttachOutput || streams.AttachError {
                        return <-receiveStdoutError

halfline force-pushed the just-die-after-i-hang-up branch from f28ca61 to af0fc8b Compare December 22, 2022 01:54

halfline force-pushed the just-die-after-i-hang-up branch from db770e5 to 2c0ec31 Compare December 22, 2022 03:14

halfline mentioned this pull request Jan 20, 2023

cmd/run, pkg/podman: Stop container once the last session finishes #1213

Open

debarshiray requested changes Sep 30, 2023

View reviewed changes

debarshiray mentioned this pull request Oct 3, 2023

Optionally make exec session terminate with parent #1204

Open

debarshiray mentioned this pull request Oct 3, 2023

systemd-oomd kills all processes inside a container #816

Open

1player mentioned this pull request Oct 4, 2023

distrobox-enter should terminate child processes when the terminal is closed 89luca89/distrobox#966

Open

1player mentioned this pull request Feb 10, 2024

CLI processes started in distrobox/toolbox are not terminated correctly fedora-silverblue/issue-tracker#533

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/run: Ensure underlying container is stopped when toolbox is killed #1207

cmd/run: Ensure underlying container is stopped when toolbox is killed #1207

halfline commented Dec 22, 2022

halfline commented Dec 22, 2022

softwarefactory-project-zuul bot commented Dec 22, 2022

softwarefactory-project-zuul bot commented Dec 22, 2022

debarshiray commented Sep 30, 2023 •

edited

Loading

debarshiray left a comment •

edited

Loading

halfline commented Oct 1, 2023 •

edited

Loading

debarshiray commented Oct 3, 2023 •

edited

Loading

halfline commented Oct 3, 2023 •

edited

Loading

halfline commented Oct 3, 2023

debarshiray commented Oct 3, 2023

debarshiray commented Oct 3, 2023

halfline commented Oct 3, 2023

halfline commented Oct 5, 2023 •

edited

Loading

cmd/run: Ensure underlying container is stopped when toolbox is killed #1207

Are you sure you want to change the base?

cmd/run: Ensure underlying container is stopped when toolbox is killed #1207

Conversation

halfline commented Dec 22, 2022

halfline commented Dec 22, 2022

softwarefactory-project-zuul bot commented Dec 22, 2022

softwarefactory-project-zuul bot commented Dec 22, 2022

debarshiray commented Sep 30, 2023 • edited Loading

debarshiray left a comment • edited Loading

Choose a reason for hiding this comment

halfline commented Oct 1, 2023 • edited Loading

debarshiray commented Oct 3, 2023 • edited Loading

halfline commented Oct 3, 2023 • edited Loading

halfline commented Oct 3, 2023

debarshiray commented Oct 3, 2023

debarshiray commented Oct 3, 2023

halfline commented Oct 3, 2023

halfline commented Oct 5, 2023 • edited Loading

debarshiray commented Sep 30, 2023 •

edited

Loading

debarshiray left a comment •

edited

Loading

halfline commented Oct 1, 2023 •

edited

Loading

debarshiray commented Oct 3, 2023 •

edited

Loading

halfline commented Oct 3, 2023 •

edited

Loading

halfline commented Oct 5, 2023 •

edited

Loading