Containers remain running after exiting #114

alice-mkh · 2019-04-13T10:03:00Z

In particular, this means it's impossible to remove a toolbox-created container without first stopping/killing it with podman:

 ~/toolbox  ./toolbox create -c test
Created container: test
Enter with: toolbox enter --container test
 ~/toolbox  ./toolbox enter -c test 
🔹[exalm@toolbox toolbox]$ logout
 ~/toolbox  ./toolbox rm test
toolbox: failed to remove container test
 ~/toolbox  podman ps
CONTAINER ID  IMAGE                              COMMAND     CREATED         STATUS             PORTS  NAMES
1a08f09ea797  localhost/fedora-toolbox-exalm:30  sleep +Inf  16 seconds ago  Up 10 seconds ago         test
 ~/toolbox  podman stop test
1a08f09ea79710801859bea8dc6a5a85d2031ce1a73dd7d284c3e1fa51a67be0
 ~/toolbox  ./toolbox rm test
 ~/toolbox 

The text was updated successfully, but these errors were encountered:

debarshiray · 2019-04-16T13:55:40Z

toolbox rm --force should also work.

But yes, I'd like to make this properly reference counted, but sadly, I don't know of a way to implement that using the existing Podman command line interface.

imciner2 · 2019-05-22T01:19:38Z

Using toolbox rm --force is not able to delete the container when it is running on my machine.

Perhaps what would could be done is to detect if the container is running when deleting it and then give the user a prompt such as This toolbox is currently running, are you sure you wish to delete it [y/N]: Then call podman stop before calling the delete command if they choose to continue.

paul8046 · 2019-08-11T23:52:00Z

I am unable to delete a container created by toolbox even after using toolbox rm --force and stopping it with podman. I am having to reboot and then toolbox rm works.

debarshiray · 2019-08-13T00:54:43Z

I am unable to delete a container created by toolbox even after using
toolbox rm --force and stopping it with podman. I am having to reboot
and then toolbox rm works.

It will fail if you have currently active toolbox enter sessions. We need to improve the error handling there.

Otherwise, if that's not the case and you can reproduce at will, then I suggest trying podman rm --force <container> to delete the container. If that also fails, then we might have a Podman bug. In any case, let's use a different issue to discuss this.

Thanks for stopping by!

bam80 · 2020-08-26T14:23:14Z

Considering it was reported almost 1.5 years ago, I'm wondering if there was any progress since then.

bam80 · 2020-08-26T14:38:08Z

Seems I can't stop the containers left after Toolbox:

[bam@host ~]$ toolbox list
IMAGE ID      IMAGE NAME                                            CREATED
a198bc8c3cda  registry.fedoraproject.org/f31/fedora-toolbox:31      9 months ago
fe7b8c2393f9  registry.fedoraproject.org/f32/fedora-toolbox:32      4 months ago
3864bc58ab7b  registry.fedoraproject.org/f33/fedora-toolbox:33      4 months ago
b390f0663e2a  registry.fedoraproject.org/f33/fedora-toolbox:latest  2 weeks ago

CONTAINER ID  CONTAINER NAME     CREATED       STATUS      IMAGE NAME
c27048bea726  fedora-toolbox-31  6 months ago  configured  registry.fedoraproject.org/f31/fedora-toolbox:31
f48d171dc79e  fedora-toolbox-32  3 months ago  running     registry.fedoraproject.org/f32/fedora-toolbox:32
...
f429c215fa02  toolbox            3 hours ago   running     registry.fedoraproject.org/f32/fedora-toolbox:32


[bam@host ~]$ podman stop fedora-toolbox-32 
2020-08-26T14:33:24.000938739Z: kill process 3459: Operation not permitted
Error: operation not permitted
[bam@host ~]$ podman stop toolbox 
2020-08-26T14:33:32.000453021Z: kill process 3318: Operation not permitted
Error: operation not permitted

[bam@host ~]$ sudo podman stop toolbox 
[sudo] password for bam: 
Error: no container with name or ID toolbox found: no such container

bam80 · 2020-08-26T15:02:50Z

[bam@host ~]$ podman stop fedora-toolbox-32 
2020-08-26T14:33:24.000938739Z: kill process 3459: Operation not permitted
Error: operation not permitted

The reason of the error is seems conmon subprocesses run with weird PID 100000:

bam         3315    1332  0 16:32 ?        Ssl    0:00  \_ /usr/bin/conmon --api-version 1 -c f429c215fa02f617362a0b17c4045eb32d4d8c461c38248fb9e1e0a3d9f1220b -u f429c215fa02f617362a0b17c4045eb32d4d8c461c38248fb
100000      3318    3315  0 16:32 ?        Ss     0:00  |   \_ sleep +Inf
...
bam         3456    1332  0 16:33 ?        Ssl    0:00  \_ /usr/bin/conmon --api-version 1 -c f48d171dc79e0db510fa334827fa5a4693b1f952221bc916666f408d845d5b92 -u f48d171dc79e0db510fa334827fa5a4693b1f952221bc9166
100000      3459    3456  0 16:33 ?        Ss     0:00  |   \_ sleep +Inf

[bam@host ~]$ ll /var/home/bam/.local/share/containers/storage/overlay-containers/f429c215fa02f617362a0b17c4045eb32d4d8c461c38248fb9e1e0a3d9f1220b/
total 4
drwx------. 3 100000 100000 4096 Aug 26 16:38 userdata

What is it? Is it an error, or it's by design?
In the former case, how could I fix my containers?

debarshiray · 2020-08-26T15:24:11Z

podman stop <container> should definitely work, unless you have active toolbox enter or podman run sessions.

debarshiray · 2020-08-26T15:41:10Z

So far, I can think of two different ways to make Toolbox containers reference-counted so that they automatically stop once the last toolbox enter or toolbox run session has terminated.

(Note that stopping the container is the same thing as terminating the container's entry point process.)

The enter and run commands use POSIX signals to tell the container's entry point that a new session is about to start, or has just ended. eg., it could send SIGUSR1 for one and SIGUSR2 for the other. The entry point handles these signals and keeps a reference count of the number of active sessions. Once the counter hits zero, it terminates.

This can be implemented with Go channels, os/signal and such. Here is an example.

The downside of this is that it's not resilient against crashes in the enter and run commands. If they crash, then the second signal indicating the end of the session might not get sent.

The enter and run sessions acquire shared file locks (ie., flock --shared ...) and the entry point blocks trying to acquire an exclusive lock (ie., flock --exclusive ...) on a common file. The entry point will be unblocked once all shared locks have been released by the active sessions, and then it can terminate.

The nice thing about this is that locks are automatically released by the kernel when a process terminates. So, even if the enter and run commands crash, the locks would get released.

bam80 · 2020-08-26T16:28:30Z

podman stop <container> should definitely work, unless you
have active toolbox enter or podman run sessions.

Still it doesn't, and I have no running sessions.

This sounds like a Podman bug.

If you can repeatedly reproduce this, then I'd suggest filing a Podman bug. It would be even better if you can reproduce this just with Podman commands. eg., podman create ... sleep +Inf a container, then podman start ... and so on.

However, I can kill those sleep processes with 100000 PIDs as usual user,
and then the session stops.
Do you have an idea where that 100000 PIDs come from?

I think those are UIDs, not PIDs.

Those UIDs look big because they are inside a user namespace.

Currently, once a toolbox container gets started with 'podman start', as part of the 'toolbox enter' command, it doesn't stop unless the host is shut down or someone explicitly calls 'podman stop'. This becomes annoying if someone tries to remove the container because commands like 'podman rm' and such don't work without the '--force' flag, even if all active 'toolbox enter' and 'toolbox run' sessions have terminated. A system of reference counting based on advisory file locks has been used to automatically terminate the container's entry point once all the active sessions have died. The 'toolbox enter' and 'toolbox run' sessions acquire shared file locks, and the container's entry point blocks trying to acquire an exclusive lock on a common file. The entry point will be unblocked once all shared locks have been released by the active sessions, and then it terminates. Once the container has been started, and the entry point has finished setting it up, the entry point waits for a while before trying to acquire its exclusive lock. This is meant to give some time to the first session to go ahead and acquire its shared lock. A duration of 25 seconds, the same interval as the default for D-Bus method calls, was chosen for this. containers#114

bam80 · 2020-08-26T17:58:01Z

If you can repeatedly reproduce this

Seems I'm not. Not sure if it's good or bad :)
Anyway, I have already filed the podman issue and closed it as non-reproducible. I'll reopen if I face it again:
containers/podman#7463

I think those are UIDs, not PIDs.

Those UIDs look big because they are inside a user namespace.

Of course they are UIDs, sorry.

Thanks!

Currently, once a toolbox container gets started with 'podman start', as part of the 'toolbox enter' command, it doesn't stop unless the host is shut down or someone explicitly calls 'podman stop'. This becomes annoying if someone tries to remove the container because commands like 'podman rm' and such don't work without the '--force' flag, even if all active 'toolbox enter' and 'toolbox run' sessions have terminated. A system of reference counting based on advisory file locks has been used to automatically terminate the container's entry point once all the active sessions have died. The 'toolbox enter' and 'toolbox run' sessions acquire shared file locks, and the container's entry point blocks trying to acquire an exclusive lock on a common file. The entry point will be unblocked once all shared locks have been released by the active sessions, and then it terminates. Once the container has been started, and the entry point has finished setting it up, the entry point waits for a while before trying to acquire its exclusive lock. This is meant to give some time to the first session to go ahead and acquire its shared lock. A duration of 25 seconds, the same interval as the default for D-Bus method calls, was chosen for this. containers#114

containers#114

Currently, once a toolbox container gets started with 'podman start', as part of the 'toolbox enter' command, it doesn't stop unless the host is shut down or someone explicitly calls 'podman stop'. This becomes annoying if someone tries to remove the container because commands like 'podman rm' and such don't work without the '--force' flag, even if all active 'toolbox enter' and 'toolbox run' sessions have terminated. A system of reference counting based on advisory file locks has been used to automatically terminate the container's entry point once all the active sessions have died. The 'toolbox enter' and 'toolbox run' sessions acquire shared file locks, and the container's entry point blocks trying to acquire an exclusive lock on a common file. The entry point will be unblocked once all shared locks have been released by the active sessions, and then it terminates. Once the container has been started, and the entry point has finished setting it up, the entry point waits for a while before trying to acquire its exclusive lock. This is meant to give some time to the first session to go ahead and acquire its shared lock. A duration of 25 seconds, the same interval as the default for D-Bus method calls, was chosen for this. containers#114

HarryMichal · 2020-09-10T09:48:16Z

podman stop <container> should definitely work, unless you have active toolbox enter or podman run sessions.

I believe podman stop <container> will force those session to exit.

debarshiray · 2020-12-18T16:37:06Z

So far, I can think of two different ways to make Toolbox
containers reference-counted so that they automatically
stop once the last toolbox enter or toolbox run session
has terminated.

Another option, used by coreos/toolbox, is to call podman stop after every invocation of podman exec in toolbox enter. podman stop will keep failing as long there's any active podman exec session, but once the last one finishes, the container will get stopped.

It's less sophisticated than the other alternatives, but simpler to implement.

nanonyme · 2021-02-25T19:44:13Z

@debarshiray I would vote for the simple approach (just always stop after exec) as long as you suppress the spurious "Error: container ... has active exec sessions, refusing to clean up: container state improper"output when container cannot be stopped.

castedo · 2021-12-31T16:49:14Z

Here's a simple fix that I'm trying out in https://github.com/castedo/cnest. So far seems to be working well.
https://github.com/castedo/cnest/blob/c76c5b0dfb08f9f5db6ddcc2b7ec66c8b84a5335/bin/cnest#L43
If there are zero of those IDs, then call podman stop otherwise don't.

bellegarde-c · 2022-08-30T15:06:23Z

This is working, it finally kills all toolbox running in background

[gnumdk@xps13 ~]$ cat .config/systemd/user/logout.service 
[Unit]
Description=Logout script
DefaultDependencies=no
Conflicts=shutdown.target
Before=basic.target shutdown.target

[Service]
Type=oneshot
ExecStop=%h/.config/systemd/user/logout.sh
RemainAfterExit=yes
TimeoutStopSec=5m

[Install]
WantedBy=basic.target

[gnumdk@xps13 ~]$ cat .config/systemd/user/logout.sh
#!/bin/bash

# Force container to the exit state
podman container stop fedora-toolbox-36

# Failed, container in stopped state
if (( $? != 0 ))
then
	# Force it to run again
	toolbox run true
	# And stop it
	podman container stop fedora-toolbox-36
fi

Currently, once a toolbox container gets started with 'podman start', as part of the 'toolbox enter' command, it doesn't stop unless the host is shut down or someone explicitly calls 'podman stop'. This becomes annoying if someone tries to remove the container because commands like 'podman rm' and such don't work without the '--force' flag, even if all active 'toolbox enter' and 'toolbox run' sessions have terminated. A crude form of reference counting has been set up that depends on 'podman stop' failing as long there's any active 'podman exec' session left. Every invocation of 'podman exec' in 'enter' and 'run' is followed by 'podman stop', so that the container gets stopped once the last session finishes. While this approach looks very crude at first glance, it does have the advantage of being ridiculously simple to implement. Thus, it's a lot more robust and easier to verify than setting up some custom reference counting or synchronization using other means like POSIX signals or file locks. Based on the implementation in github.com/coreos/toolbox. containers#114

debarshiray · 2023-01-18T19:53:32Z

This is working, it finally kills all toolbox running in background

That's about cleaning up any active podman exec sessions when logging out, right?

If so, then that's different from this issue. This issue is about stopping the container (ie., killing the entry point) when the last podman exec session goes away during normal use, so that --force is not necessary with podman rm and the output of toolbox list is more intuitive.

I think the problem you were trying to address might have been fixed in Podman through containers/podman#17025

debarshiray · 2023-01-18T20:01:40Z

So far, I can think of two different ways to make Toolbox
containers reference-counted so that they automatically
stop once the last toolbox enter or toolbox run session
has terminated.

Another option, used by coreos/toolbox, is to call podman stop after every invocation of podman exec in toolbox enter. podman stop will keep failing as long there's any active podman exec session, but once the last one finishes, the container will get stopped.

It turns out that current implementations of podman stop do stop the container (ie., the entry point gets killed) even when there are active podman exec sessions around. This negates the coreos/toolbox approach of always calling podman stop and leaving the reference counting to Podman.

debarshiray · 2023-01-18T20:07:37Z

Here's a simple fix that I'm trying out in https://github.com/castedo/cnest. So far seems to be working well. https://github.com/castedo/cnest/blob/c76c5b0dfb08f9f5db6ddcc2b7ec66c8b84a5335/bin/cnest#L43 If there are zero of those IDs, then call podman stop otherwise don't.

Interesting. So you are doing:

podman exec -it \
  -e LANG \
  -e TERM \
  -e DISPLAY \
  --detach-keys="" \
  -e OSVIRTALIAS=$CONTAINER \
  -e debian_chroot=$CONTAINER \
  $CONTAINER \
  $COMMAND

NUM_EXEC=$(podman container inspect --format "{{len .ExecIDs}}" $CONTAINER)
if [[ $NUM_EXEC -eq 0 ]]; then
  podman stop $CONTAINER
fi

I am worried that there's a race. A podman start against the same container from a different terminal can slip in between the podman inspect and the podman start.

castedo · 2023-01-18T21:53:15Z

I am worried that there's a race. A podman start against the same container from a different terminal can slip in between the podman inspect and the podman start.

Good eye! You are correct, there is that possibility. It's fair to say what I coded in cnest for this is a hack.

I've been using it for more than a year now. Still working well. But I'm only using it for "nest" containers for which I enter and exit them manually at the command line. I'm not fast enough to be able to create the race condition.

My hack might not be OK for more general uses of a container. Maybe there are single-user cases where someone has programs in the background entering/starting the container and not only entering from the command line manually.

sandorex · 2023-06-12T19:16:58Z

I've made a wrapper script to fix this issue (89luca89/distrobox#786) when using distrobox and adapted it for toolbox, pretty simple, it gets the conmon PID on start of the container then kills it afterwards using a background script

Could easily be adapted to kill the container too when there is no more shells open

#!/usr/bin/env bash
#
# toolbox-enter-wrapper - wrapper to call the shell properly in toolbox

if [ -z "$1" ]; then
    echo "Please provide container name"
    exit 1
fi

PIDFILE_DIR="$HOME/.local/state/toolbox"
PIDFILE="$PIDFILE_DIR/$$"

mkdir -p "$PIDFILE_DIR"
touch "$PIDFILE"

nohup sh <<EOF >/dev/null 2>&1 &
# wait for the main script to end
while ps -p $$ >/dev/null; do
    sleep 1s
done

# get pid from the file
PID="\$(cat "$PIDFILE")"
rm -f "$PIDFILE"

# conmon already dead, quit
if ! ps -p "\$PID" >/dev/null; then
    exit 0
fi

# kill conmon
kill -1 "\$PID"

# quit
exit 0
EOF

toolbox run -c "$1" sh -c "echo \$PPID > $PIDFILE; exec ${2:-$SHELL}"

debarshiray mentioned this issue Aug 26, 2020

container continues to run even after shell exit #540

Closed

debarshiray linked a pull request Aug 26, 2020 that will close this issue

cmd/initContainer, cmd/run: Stop container once the last session dies #541

Open

debarshiray added a commit to debarshiray/toolbox that referenced this issue Aug 26, 2020

cmd/run: Split start

17ce50d

containers#114

HarryMichal added the 1. Bug Something isn't working label Sep 10, 2020

HarryMichal added this to the Release 0.1.0 milestone Sep 10, 2020

HarryMichal linked a pull request Sep 10, 2020 that will close this issue

cmd/initContainer, cmd/run: Stop container once the last session dies #541

Open

debarshiray mentioned this issue Dec 18, 2020

toolbox stop command? #334

Closed

debarshiray mentioned this issue Dec 18, 2020

Containers have "running" states, but no "stop" command #645

Closed

HarryMichal modified the milestones: Release 0.1.0, Release 0.2.0 Feb 26, 2021

debarshiray mentioned this issue Sep 16, 2021

Automatically stopping containers - does it actually work? coreos/toolbox#72

Open

castedo mentioned this issue Dec 29, 2021

Call podman stop after each exec castedo/cnest#5

Closed

debarshiray mentioned this issue May 5, 2022

Prevent toolboxes from being deleted on podman system prune #1005

Closed

HarryMichal mentioned this issue Jun 4, 2022

Add an option to stop or kill the toolbox container (Feature request) #1015

Closed

debarshiray mentioned this issue Jan 18, 2023

cmd/run, pkg/podman: Stop container once the last session finishes #1213

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Containers remain running after exiting #114

Containers remain running after exiting #114

alice-mkh commented Apr 13, 2019

debarshiray commented Apr 16, 2019

imciner2 commented May 22, 2019

paul8046 commented Aug 11, 2019

debarshiray commented Aug 13, 2019

bam80 commented Aug 26, 2020

bam80 commented Aug 26, 2020

bam80 commented Aug 26, 2020

debarshiray commented Aug 26, 2020

debarshiray commented Aug 26, 2020

bam80 commented Aug 26, 2020 •

edited by debarshiray

Loading

bam80 commented Aug 26, 2020

HarryMichal commented Sep 10, 2020

debarshiray commented Dec 18, 2020

nanonyme commented Feb 25, 2021 •

edited

Loading

castedo commented Dec 31, 2021 •

edited

Loading

bellegarde-c commented Aug 30, 2022 •

edited

Loading

debarshiray commented Jan 18, 2023

debarshiray commented Jan 18, 2023

debarshiray commented Jan 18, 2023

castedo commented Jan 18, 2023

sandorex commented Jun 12, 2023 •

edited

Loading

Containers remain running after exiting #114

Containers remain running after exiting #114

Comments

alice-mkh commented Apr 13, 2019

debarshiray commented Apr 16, 2019

imciner2 commented May 22, 2019

paul8046 commented Aug 11, 2019

debarshiray commented Aug 13, 2019

bam80 commented Aug 26, 2020

bam80 commented Aug 26, 2020

bam80 commented Aug 26, 2020

debarshiray commented Aug 26, 2020

debarshiray commented Aug 26, 2020

bam80 commented Aug 26, 2020 • edited by debarshiray Loading

bam80 commented Aug 26, 2020

HarryMichal commented Sep 10, 2020

debarshiray commented Dec 18, 2020

nanonyme commented Feb 25, 2021 • edited Loading

castedo commented Dec 31, 2021 • edited Loading

bellegarde-c commented Aug 30, 2022 • edited Loading

debarshiray commented Jan 18, 2023

debarshiray commented Jan 18, 2023

debarshiray commented Jan 18, 2023

castedo commented Jan 18, 2023

sandorex commented Jun 12, 2023 • edited Loading

bam80 commented Aug 26, 2020 •

edited by debarshiray

Loading

nanonyme commented Feb 25, 2021 •

edited

Loading

castedo commented Dec 31, 2021 •

edited

Loading

bellegarde-c commented Aug 30, 2022 •

edited

Loading

sandorex commented Jun 12, 2023 •

edited

Loading