Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce toolbox enter start time #1070

Closed
CMGeldenhuys opened this issue Jun 30, 2022 · 6 comments
Closed

Reduce toolbox enter start time #1070

CMGeldenhuys opened this issue Jun 30, 2022 · 6 comments
Labels
1. Feature request A request for a new feature

Comments

@CMGeldenhuys
Copy link

Is your feature request related to a problem? Please describe.
I'm new to using toolbox, but toolbox enter seems to be a bit sluggish to launch(~1s). While still fast (I might just be nit-picking), my workflow is predominantly on the command line and so I'm constantly opening new terminal sessions, where the 1s is a significant pause before the prompt is visible.

$ time toolbox run true

real	0m1.009s
user	0m0.465s
sys	0m0.174s

$ time podman exec fedora-toolbox-36 true

real	0m0.257s
user	0m0.103s
sys	0m0.039s

Comparing the run times between toolbox run ... and podman exec ... makes me think the problem is similar to that of #654. Might be related to toolbox evaluating expensive paths to ensure safe fallback (speculation).

Describe the solution you'd like
Either a way to bypass the fallback sanity/safety checks with a flag, or perhaps narrow the gap between podman exec and toolbox enter run times.

Describe alternatives you've considered
Directly running podman exec ... instead of toolbox enter, on the launch of new terminal sessions.

Some reduction can be achieved by specifying the container:

$ time toolbox run --container fedora-toolbox-36 true

real	0m0.907s
user	0m0.469s
sys	0m0.183s

$ time toolbox run true

real	0m1.011s
user	0m0.519s
sys	0m0.173s

Additional context
I've tried to mark (>>) where the potential slow down might occur

$ toolbox run --verbose true |& ts '[%Y-%m-%d %H:%M:%.S]'
[2022-06-30 08:59:46.242384] level=debug msg="Running as real user ID 1000"
[2022-06-30 08:59:46.242466] level=debug msg="Resolved absolute path to the executable as /usr/bin/toolbox"
[2022-06-30 08:59:46.242519] level=debug msg="Running on a cgroups v2 host"
[2022-06-30 08:59:46.242543] level=debug msg="Checking if /etc/subgid and /etc/subuid have entries for user chrisg"
[2022-06-30 08:59:46.242564] level=debug msg="Validating sub-ID file /etc/subuid"
[2022-06-30 08:59:46.242582] level=debug msg="Validating sub-ID file /etc/subgid"
[2022-06-30 08:59:46.242599] level=debug msg="TOOLBOX_PATH is /usr/bin/toolbox"
[2022-06-30 08:59:46.242617] level=debug msg="Migrating to newer Podman"
[2022-06-30 08:59:46.242634] level=debug msg="Toolbox config directory is /var/home/chrisg/.config/toolbox"
[2022-06-30 08:59:46.362310] level=debug msg="Current Podman version is 4.1.1"
[2022-06-30 08:59:46.362541] level=debug msg="Creating runtime directory /run/user/1000/toolbox"
[2022-06-30 08:59:46.362641] level=debug msg="Old Podman version is 4.1.1"
[2022-06-30 08:59:46.362721] level=debug msg="Migration not needed: Podman version 4.1.1 is unchanged"
[2022-06-30 08:59:46.362782] level=debug msg="Setting up configuration"
[2022-06-30 08:59:46.362828] level=debug msg="Setting up configuration: file /var/home/chrisg/.config/containers/toolbox.conf not found"
[2022-06-30 08:59:46.362876] level=debug msg="Resolving image name"
[2022-06-30 08:59:46.362921] level=debug msg="Distribution (CLI): ''"
[2022-06-30 08:59:46.362962] level=debug msg="Image (CLI): ''"
[2022-06-30 08:59:46.363001] level=debug msg="Release (CLI): ''"
[2022-06-30 08:59:46.363039] level=debug msg="Resolved image name"
[2022-06-30 08:59:46.363080] level=debug msg="Image: 'fedora-toolbox:36'"
[2022-06-30 08:59:46.363104] level=debug msg="Release: '36'"
[2022-06-30 08:59:46.363122] level=debug msg="Resolving container name"
[2022-06-30 08:59:46.363140] level=debug msg="Container: ''"
[2022-06-30 08:59:46.363158] level=debug msg="Image: 'fedora-toolbox:36'"
[2022-06-30 08:59:46.363178] level=debug msg="Release: '36'"
[2022-06-30 08:59:46.363196] level=debug msg="Resolved container name"
[2022-06-30 08:59:46.363214] level=debug msg="Container: 'fedora-toolbox-36'"
[2022-06-30 08:59:46.363235] level=debug msg="Resolving image name"
[2022-06-30 08:59:46.363254] level=debug msg="Distribution (CLI): ''"
[2022-06-30 08:59:46.363272] level=debug msg="Image (CLI): ''"
[2022-06-30 08:59:46.363290] level=debug msg="Release (CLI): ''"
[2022-06-30 08:59:46.363309] level=debug msg="Resolved image name"
[2022-06-30 08:59:46.363332] level=debug msg="Image: 'fedora-toolbox:36'"
[2022-06-30 08:59:46.363349] level=debug msg="Release: '36'"
[2022-06-30 08:59:46.363367] level=debug msg="Resolving container name"
[2022-06-30 08:59:46.363384] level=debug msg="Container: ''"
[2022-06-30 08:59:46.363402] level=debug msg="Image: 'fedora-toolbox:36'"
[2022-06-30 08:59:46.363419] level=debug msg="Release: '36'"
[2022-06-30 08:59:46.363439] level=debug msg="Resolved container name"
[2022-06-30 08:59:46.363458] level=debug msg="Container: 'fedora-toolbox-36'"
>> [2022-06-30 08:59:46.363492] level=debug msg="Checking if container fedora-toolbox-36 exists"
>> [2022-06-30 08:59:46.515334] level=debug msg="Inspecting mounts of container fedora-toolbox-36"
>> [2022-06-30 08:59:46.662872] level=debug msg="Starting container fedora-toolbox-36"
>> [2022-06-30 08:59:46.813661] level=debug msg="Inspecting entry point of container fedora-toolbox-36"
>> [2022-06-30 08:59:46.967533] level=debug msg="Entry point PID is a float64"
[2022-06-30 08:59:46.967652] level=debug msg="Entry point of container fedora-toolbox-36 is toolbox (PID=1904)"
[2022-06-30 08:59:46.967689] level=debug msg="Waiting for container fedora-toolbox-36 to finish initializing"
[2022-06-30 08:59:46.967718] level=debug msg="Creating runtime directory /run/user/1000/toolbox"
[2022-06-30 08:59:46.967744] level=debug msg="Checking if initialization stamp /run/user/1000/toolbox/container-initialized-1904 exists"
[2022-06-30 08:59:46.967770] level=debug msg="Container fedora-toolbox-36 is initialized"
[2022-06-30 08:59:46.967796] level=debug msg="Checking if 'podman exec' supports disabling the detach keys"
[2022-06-30 08:59:46.967990] level=debug msg="'podman exec' supports disabling the detach keys"
[2022-06-30 08:59:46.968028] level=debug msg="Creating list of environment variables to forward"
[2022-06-30 08:59:46.968057] level=debug msg="COLORTERM=truecolor"
[2022-06-30 08:59:46.968083] level=debug msg="DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1000/bus"
[2022-06-30 08:59:46.968112] level=debug msg="DBUS_SYSTEM_BUS_ADDRESS is unset"
[2022-06-30 08:59:46.968139] level=debug msg="DESKTOP_SESSION=i3"
[2022-06-30 08:59:46.968164] level=debug msg="DISPLAY=:0"
[2022-06-30 08:59:46.968190] level=debug msg="LANG=en_ZA.UTF-8"
[2022-06-30 08:59:46.968216] level=debug msg="SHELL=/bin/bash"
[2022-06-30 08:59:46.968242] level=debug msg="SSH_AUTH_SOCK=/tmp/ssh-XXXXXXgidfae/agent.1704"
[2022-06-30 08:59:46.968268] level=debug msg="TERM=xterm-256color"
[2022-06-30 08:59:46.968294] level=debug msg="TOOLBOX_PATH=/usr/bin/toolbox"
[2022-06-30 08:59:46.968320] level=debug msg="USER=chrisg"
[2022-06-30 08:59:46.968349] level=debug msg="VTE_VERSION is unset"
[2022-06-30 08:59:46.968376] level=debug msg="WAYLAND_DISPLAY is unset"
[2022-06-30 08:59:46.968402] level=debug msg="XAUTHORITY=/run/user/1000/gdm/Xauthority"
[2022-06-30 08:59:46.968428] level=debug msg="XDG_CURRENT_DESKTOP=i3"
[2022-06-30 08:59:46.968454] level=debug msg="XDG_DATA_DIRS=/var/home/chrisg/.local/share/flatpak/exports/share:/var/lib/flatpak/exports/share:/usr/local/share/:/usr/share/"
[2022-06-30 08:59:46.968500] level=debug msg="XDG_MENU_PREFIX is unset"
[2022-06-30 08:59:46.968539] level=debug msg="XDG_RUNTIME_DIR=/run/user/1000"
[2022-06-30 08:59:46.968566] level=debug msg="XDG_SEAT=seat0"
[2022-06-30 08:59:46.968594] level=debug msg="XDG_SESSION_DESKTOP=i3"
[2022-06-30 08:59:46.968619] level=debug msg="XDG_SESSION_ID=2"
[2022-06-30 08:59:46.968645] level=debug msg="XDG_SESSION_TYPE=x11"
[2022-06-30 08:59:46.968670] level=debug msg="XDG_VTNR=2"
[2022-06-30 08:59:46.968701] level=debug msg="Running in container fedora-toolbox-36:"
[2022-06-30 08:59:46.968727] level=debug msg=podman
[2022-06-30 08:59:46.968753] level=debug msg=--log-level
[2022-06-30 08:59:46.968778] level=debug msg=error
[2022-06-30 08:59:46.968804] level=debug msg=exec
[2022-06-30 08:59:46.968830] level=debug msg=--detach-keys
[2022-06-30 08:59:46.968856] level=debug
[2022-06-30 08:59:46.968881] level=debug msg=--interactive
[2022-06-30 08:59:46.968907] level=debug msg=--tty
[2022-06-30 08:59:46.968932] level=debug msg=--user
[2022-06-30 08:59:46.968958] level=debug msg=chrisg
[2022-06-30 08:59:46.968983] level=debug msg=--workdir
[2022-06-30 08:59:46.969009] level=debug msg=/var/home/chrisg
[2022-06-30 08:59:46.969035] level=debug msg="--env=COLORTERM=truecolor"
[2022-06-30 08:59:46.969061] level=debug msg="--env=DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1000/bus"
[2022-06-30 08:59:46.969086] level=debug msg="--env=DESKTOP_SESSION=i3"
[2022-06-30 08:59:46.969112] level=debug msg="--env=DISPLAY=:0"
[2022-06-30 08:59:46.969138] level=debug msg="--env=LANG=en_ZA.UTF-8"
[2022-06-30 08:59:46.969164] level=debug msg="--env=SHELL=/bin/bash"
[2022-06-30 08:59:46.969190] level=debug msg="--env=SSH_AUTH_SOCK=/tmp/ssh-XXXXXXgidfae/agent.1704"
[2022-06-30 08:59:46.969222] level=debug msg="--env=TERM=xterm-256color"
[2022-06-30 08:59:46.969251] level=debug msg="--env=TOOLBOX_PATH=/usr/bin/toolbox"
[2022-06-30 08:59:46.969278] level=debug msg="--env=USER=chrisg"
[2022-06-30 08:59:46.969304] level=debug msg="--env=XAUTHORITY=/run/user/1000/gdm/Xauthority"
[2022-06-30 08:59:46.969330] level=debug msg="--env=XDG_CURRENT_DESKTOP=i3"
[2022-06-30 08:59:46.969356] level=debug msg="--env=XDG_DATA_DIRS=/var/home/chrisg/.local/share/flatpak/exports/share:/var/lib/flatpak/exports/share:/usr/local/share/:/usr/share/"
[2022-06-30 08:59:46.969387] level=debug msg="--env=XDG_RUNTIME_DIR=/run/user/1000"
[2022-06-30 08:59:46.969415] level=debug msg="--env=XDG_SEAT=seat0"
[2022-06-30 08:59:46.969441] level=debug msg="--env=XDG_SESSION_DESKTOP=i3"
[2022-06-30 08:59:46.969467] level=debug msg="--env=XDG_SESSION_ID=2"
[2022-06-30 08:59:46.969507] level=debug msg="--env=XDG_SESSION_TYPE=x11"
[2022-06-30 08:59:46.969535] level=debug msg="--env=XDG_VTNR=2"
[2022-06-30 08:59:46.969561] level=debug msg=fedora-toolbox-36
[2022-06-30 08:59:46.969587] level=debug msg=capsh
[2022-06-30 08:59:46.969612] level=debug msg="--caps="
[2022-06-30 08:59:46.969638] level=debug msg=--
[2022-06-30 08:59:46.969664] level=debug msg=-c
[2022-06-30 08:59:46.969689] level=debug msg="exec \"$@\""
[2022-06-30 08:59:46.969717] level=debug msg=/bin/sh
[2022-06-30 08:59:46.969745] level=debug msg=true
@CMGeldenhuys CMGeldenhuys added the 1. Feature request A request for a new feature label Jun 30, 2022
@debarshiray
Copy link
Member

Yes, it will be good to optimize run and enter even further for the most common cases, if possible.

@debarshiray
Copy link
Member

Might be related to toolbox evaluating expensive paths to
ensure safe fallback (speculation).

Yes, you are right. We do spawn podman(1) a few times to evaluate various possibilities. Cutting down on the number of times an external command is spawned can help.

I've tried to mark (>>) where the potential slow down might occur

You might be onto something. Looking at the lines that you pointed out:

>> [2022-06-30 08:59:46.363492] level=debug msg="Checking if container fedora-toolbox-36 exists"
>> [2022-06-30 08:59:46.515334] level=debug msg="Inspecting mounts of container fedora-toolbox-36"
>> [2022-06-30 08:59:46.662872] level=debug msg="Starting container fedora-toolbox-36"
>> [2022-06-30 08:59:46.813661] level=debug msg="Inspecting entry point of container fedora-toolbox-36"
>> [2022-06-30 08:59:46.967533] level=debug msg="Entry point PID is a float64"

The first one is an invocation of podman container exists. Might be good if we could do it in-process inside toolbox(1) without involving podman(1).

Then there are two instances of podman inspect. Both of them are for handling old containers that might have been set up differently and detecting if podman start managed to start the container's entry point process. This part can be improved. Especially because right now there's no way to ferry errors from the entry point back to the parent toolbox(1) process that the user is interacting with.

@debarshiray
Copy link
Member

I've tried to mark (>>) where the potential slow down might occur

You might be onto something. Looking at the lines that you pointed out:

>> [2022-06-30 08:59:46.363492] level=debug msg="Checking if container fedora-toolbox-36 exists"
>> [2022-06-30 08:59:46.515334] level=debug msg="Inspecting mounts of container fedora-toolbox-36"
>> [2022-06-30 08:59:46.662872] level=debug msg="Starting container fedora-toolbox-36"
>> [2022-06-30 08:59:46.813661] level=debug msg="Inspecting entry point of container fedora-toolbox-36"
>> [2022-06-30 08:59:46.967533] level=debug msg="Entry point PID is a float64"

The first one is an invocation of podman container exists. Might be good if we could do it in-process inside toolbox(1) without involving podman(1).

Then there are two instances of podman inspect. Both of them are for handling old containers that might have been set up differently and detecting if podman start managed to start the container's entry point process. This part can be improved. Especially because right now there's no way to ferry errors from the entry point back to the parent toolbox(1) process that the user is interacting with.

I am happy to report that I found a way to remove one of those podman inspect invocations for the common case where an already running Toolbx container is being entered again. I will submit a pull request soon.

debarshiray added a commit to debarshiray/toolbox that referenced this issue May 16, 2024
Currently, the 'enter' and 'run' commands always invoke 'podman start'
even if the Toolbx container's entry point is already running.  There's
no need for that.  The commands already invoke 'podman inspect' to find
out if the org.freedesktop.Flatpak.SessionHelper D-Bus service needs to
be started.  Thus, they already have what is needed to find out if the
container is stopped and 'podman start' is necessary before it can be
used with 'podman exec', or if it's already running.

The unconditional 'podman start' invocation was followed by a second
'podman inspect' invocation to find out if the 'podman start' managed to
start the container's entry point.  There's no need for this second
'podman inspect' either, just like the 'podman start', when it's already
known from the first 'podman inspect' that the container is running.

The extra 'podman start' and 'podman inspect' invocations are
sufficiently expensive to add a noticeable overhead to the 'enter' and
'run' commands.  It's common to use a container that's already running,
just like having multiple terminals with the same working directory, and
terminal emulation applications like Ptyxis try to make it easier [1].
Therefore, it's worth optimizing this code path.

[1] https://gitlab.gnome.org/chergert/ptyxis
    https://flathub.org/apps/app.devsuite.Ptyxis

containers#1070
debarshiray added a commit to debarshiray/toolbox that referenced this issue May 17, 2024
Currently, the 'enter' and 'run' commands always invoke 'podman start'
even if the Toolbx container's entry point is already running.  There's
no need for that.  The commands already invoke 'podman inspect' to find
out if the org.freedesktop.Flatpak.SessionHelper D-Bus service needs to
be started.  Thus, they already have what is needed to find out if the
container is stopped and 'podman start' is necessary before it can be
used with 'podman exec', or if it's already running.

The unconditional 'podman start' invocation was followed by a second
'podman inspect' invocation to find out if the 'podman start' managed to
start the container's entry point.  There's no need for this second
'podman inspect' either, just like the 'podman start', when it's already
known from the first 'podman inspect' that the container is running.

The extra 'podman start' and 'podman inspect' invocations are
sufficiently expensive to add a noticeable overhead to the 'enter' and
'run' commands.  It's common to use a container that's already running,
just like having multiple terminals within the same working directory,
and terminal emulation applications like Ptyxis try to make it easier to
do so [1].  Therefore, it's worth optimizing this code path.

[1] https://gitlab.gnome.org/chergert/ptyxis
    https://flathub.org/apps/app.devsuite.Ptyxis

containers#1070
@debarshiray
Copy link
Member

I am happy to report that I found a way to remove one of those podman inspect invocations for the common case where an already running Toolbx container is being entered again. I will submit a pull request soon.

Done: #1491

The next step is to explore if we can get rid of the podman container exists by using the podman inspect invocation for it. It would require neatly separating out the container doesn't exist error from other potential errors thrown by podman inspect.

debarshiray added a commit to debarshiray/toolbox that referenced this issue May 19, 2024
Currently, the 'enter' and 'run' commands always invoke 'podman start'
even if the Toolbx container's entry point is already running.  There's
no need for that.  The commands already invoke 'podman inspect' to find
out if the org.freedesktop.Flatpak.SessionHelper D-Bus service needs to
be started.  Thus, they already have what is needed to find out if the
container is stopped and 'podman start' is necessary before it can be
used with 'podman exec', or if it's already running.

The unconditional 'podman start' invocation was followed by a second
'podman inspect' invocation to find out if the 'podman start' managed to
start the container's entry point.  There's no need for this second
'podman inspect' either, just like the 'podman start', when it's already
known from the first 'podman inspect' that the container is running.

The extra 'podman start' and 'podman inspect' invocations are
sufficiently expensive to add a noticeable overhead to the 'enter' and
'run' commands.  It's common to use a container that's already running,
just like having multiple terminals within the same working directory,
and terminal emulation applications like Ptyxis try to make it easier to
do so [1].  Therefore, it's worth optimizing this code path.

[1] https://gitlab.gnome.org/chergert/ptyxis
    https://flathub.org/apps/app.devsuite.Ptyxis

containers#1070
debarshiray added a commit to debarshiray/toolbox that referenced this issue May 20, 2024
Currently, the 'enter' and 'run' commands poll at one second intervals
to check if the Toolbx container's entry point has created the
initialization stamp file to indicate that the container has been
initialized.  This came from the POSIX shell implementation [1], where
it was relatively easier to poll than to use inotify(7) to monitor the
file system.

The problem with polling is that the interval is always going to be
either too short and waste resources or too long and cause delays.  The
current one second interval is sufficiently long to add a noticeable
overhead to the 'enter' and 'run' commands.

It will be better to use inotify(7) to monitor the file system, so that
the commands can proceed as soon as the initialization stamp file is
available, instead of waiting for the polling interval to pass.

There's a fallback to polling, as before, when the operating system is
suffering from a shortage of resources needed for inotify(7).  This code
path can be forced through the TOOLBOX_RUN_USE_POLLING environment
variable for testing.

[1] Commit d3e0f3d
    containers@d3e0f3df06d3f5ac
    containers#305

containers#1070
@debarshiray
Copy link
Member

One thing that I failed to notice before is the possibility to make things faster when a container is used (ie., started) for the first time with enter or run.

We currently poll at one second intervals to check if the Toolbx container's entry point has created the initialization stamp file to indicate that the container has been initialized. This came from the POSIX shell implementation, where it was relatively easier to poll than to use inotify(7) to monitor the file system.

These days, with the Go implementation, we should use inotify(7), so that the commands can proceed as soon as the initialization stamp file is available, instead of waiting for the polling interval to pass.

See: #1495

debarshiray added a commit to debarshiray/toolbox that referenced this issue May 20, 2024
Currently, the 'enter' and 'run' commands poll at one second intervals
to check if the Toolbx container's entry point has created the
initialization stamp file to indicate that the container has been
initialized.  This came from the POSIX shell implementation [1], where
it was relatively easier to poll than to use inotify(7) to monitor the
file system.

The problem with polling is that the interval is always going to be
either too short and waste resources or too long and cause delays.  The
current one second interval is sufficiently long to add a noticeable
delay to the 'enter' and 'run' commands.

It will be better to use inotify(7) to monitor the file system, which is
quite easy to do with the Go implementation, so that the commands can
proceed as soon as the initialization stamp file is available, instead
of waiting for the polling interval to pass.

There's a fallback to polling, as before, when the operating system is
suffering from a shortage of resources needed for inotify(7).  This code
path can be forced through the TOOLBOX_RUN_USE_POLLING environment
variable for testing.

[1] Commit d3e0f3d
    containers@d3e0f3df06d3f5ac
    containers#305

containers#1070
debarshiray added a commit to debarshiray/toolbox that referenced this issue May 22, 2024
Currently, the 'enter' and 'run' commands poll at one second intervals
to check if the Toolbx container's entry point has created the
initialization stamp file to indicate that the container has been
initialized.  This came from the POSIX shell implementation [1], where
it was relatively easier to poll than to use inotify(7) to monitor the
file system.

The problem with polling is that the interval is always going to be
either too short and waste resources or too long and cause delays.  The
current one second interval is sufficiently long to add a noticeable
delay to the 'enter' and 'run' commands.

It will be better to use inotify(7) to monitor the file system, which is
quite easy to do with the Go implementation, so that the commands can
proceed as soon as the initialization stamp file is available, instead
of waiting for the polling interval to pass.

There's a fallback to polling, as before, when the operating system is
suffering from a shortage of resources needed for inotify(7).  This code
path can be forced through the TOOLBX_RUN_USE_POLLING environment
variable for testing.

[1] Commit d3e0f3d
    containers@d3e0f3df06d3f5ac
    containers#305

containers#1070
debarshiray added a commit to debarshiray/toolbox that referenced this issue May 22, 2024
Currently, the 'enter' and 'run' commands poll at one second intervals
to check if the Toolbx container's entry point has created the
initialization stamp file to indicate that the container has been
initialized.  This came from the POSIX shell implementation [1], where
it was relatively easier to poll than to use inotify(7) to monitor the
file system.

The problem with polling is that the interval is always going to be
either too short and waste resources or too long and cause delays.  The
current one second interval is sufficiently long to add a noticeable
delay to the 'enter' and 'run' commands.

It will be better to use inotify(7) to monitor the file system, which is
quite easy to do with the Go implementation, so that the commands can
proceed as soon as the initialization stamp file is available, instead
of waiting for the polling interval to pass.

There's a fallback to polling, as before, when the operating system is
suffering from a shortage of resources needed for inotify(7).  This code
path can be forced through the TOOLBX_RUN_USE_POLLING environment
variable for testing.

[1] Commit d3e0f3d
    containers@d3e0f3df06d3f5ac
    containers#305

containers#1070
debarshiray added a commit to debarshiray/toolbox that referenced this issue May 22, 2024
Currently, the 'enter' and 'run' commands poll at one second intervals
to check if the Toolbx container's entry point has created the
initialization stamp file to indicate that the container has been
initialized.  This came from the POSIX shell implementation [1], where
it was relatively easier to poll than to use inotify(7) to monitor the
file system.

The problem with polling is that the interval is always going to be
either too short and waste resources or too long and cause delays.  The
current one second interval is sufficiently long to add a noticeable
delay to the 'enter' and 'run' commands.

It will be better to use inotify(7) to monitor the file system, which is
quite easy to do with the Go implementation, so that the commands can
proceed as soon as the initialization stamp file is available, instead
of waiting for the polling interval to pass.

There's a fallback to polling, as before, when the operating system is
suffering from a shortage of resources needed for inotify(7).  This code
path can be forced through the TOOLBX_RUN_USE_POLLING environment
variable for testing.  Setting this environment variable disables some
code to ensure that the polling ticker is actually used, because,
otherwise, the race between the creation and detection of the
initialization stamp file makes it difficult to test the fallback.

[1] Commit d3e0f3d
    containers@d3e0f3df06d3f5ac
    containers#305

containers#1070
debarshiray added a commit to debarshiray/toolbox that referenced this issue May 22, 2024
Currently, the 'enter' and 'run' commands poll at one second intervals
to check if the Toolbx container's entry point has created the
initialization stamp file to indicate that the container has been
initialized.  This came from the POSIX shell implementation [1], where
it was relatively easier to poll than to use inotify(7) to monitor the
file system.

The problem with polling is that the interval is always going to be
either too short and waste resources or too long and cause delays.  The
current one second interval is sufficiently long to add a noticeable
delay to the 'enter' and 'run' commands.

It will be better to use inotify(7) to monitor the file system, which is
quite easy to do with the Go implementation, so that the commands can
proceed as soon as the initialization stamp file is available, instead
of waiting for the polling interval to pass.

There's a fallback to polling, as before, when the operating system is
suffering from a shortage of resources needed for inotify(7).  This code
path can be forced through the TOOLBX_RUN_USE_POLLING environment
variable for testing.  Setting this environment variable disables some
code to ensure that the polling ticker is actually used, because,
otherwise, the race between the creation and detection of the
initialization stamp file makes it difficult to test the fallback.

[1] Commit d3e0f3d
    containers@d3e0f3df06d3f5ac
    containers#305

containers#1070
debarshiray added a commit to debarshiray/toolbox that referenced this issue May 22, 2024
Currently, the 'enter' and 'run' commands poll at one second intervals
to check if the Toolbx container's entry point has created the
initialization stamp file to indicate that the container has been
initialized.  This came from the POSIX shell implementation [1], where
it was relatively easier to poll than to use inotify(7) to monitor the
file system.

The problem with polling is that the interval is always going to be
either too short and waste resources or too long and cause delays.  The
current one second interval is sufficiently long to add a noticeable
delay to the 'enter' and 'run' commands.

It will be better to use inotify(7) to monitor the file system, which is
quite easy to do with the Go implementation, so that the commands can
proceed as soon as the initialization stamp file is available, instead
of waiting for the polling interval to pass.

There's a fallback to polling, as before, when the operating system is
suffering from a shortage of resources needed for inotify(7).  This code
path can be forced through the TOOLBX_RUN_USE_POLLING environment
variable for testing.  Setting this environment variable disables some
code to ensure that the polling ticker is actually used, because,
otherwise, the race between the creation and detection of the
initialization stamp file makes it difficult to test the fallback.

[1] Commit d3e0f3d
    containers@d3e0f3df06d3f5ac
    containers#305

containers#1070
@debarshiray
Copy link
Member

The next step is to explore if we can get rid of the podman container exists by using the podman inspect invocation for it. It would require neatly separating out the container doesn't exist error from other potential errors thrown by podman inspect.

This is still left to be done or explored.

However, I am inclined to close this issue for the time being because we did land the optimizations in #1491 and #1495 We can revisit this if people still find the enter and run commands to be slow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1. Feature request A request for a new feature
Projects
None yet
Development

No branches or pull requests

2 participants