Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFE: Migration path for users of coreos/toolbox #689

Closed
miabbott opened this issue Feb 9, 2021 · 27 comments · Fixed by #935
Closed

RFE: Migration path for users of coreos/toolbox #689

miabbott opened this issue Feb 9, 2021 · 27 comments · Fixed by #935
Labels
1. Feature request A request for a new feature 2. Design Design discussion 5. Help Wanted Extra attention is needed

Comments

@miabbott
Copy link

miabbott commented Feb 9, 2021

Is your feature request related to a problem? Please describe.
In RHCOS, we are shipping https://github.com/coreos/toolbox in the OS and we want to move to shipping containers/toolbox in its place. In order to do that, we need to ensure that the upgrade path is smooth, as toolbox is heavily utilized by the Red Hat Support organization for gathering debug information.

Describe the solution you'd like
I see two requirements for RHCOS users:

  • support for .toolboxrc files
  • retention of the same behavior from coreos/toolbox

Describe alternatives you've considered
The CoreOS team is currently maintaining the bash script implementation over at coreos/toolbox and we could continue to do so.

Additional context
FWIW, I don't expect the containers/toolbox team to work on this alone. Please reach out to the CoreOS team to help with the work.

@miabbott miabbott added the 1. Feature request A request for a new feature label Feb 9, 2021
@debarshiray
Copy link
Member

We have from time to time discussed some sort of configuration file for Toolbox. It could either be used to tweak the behaviour of toolbox(1) itself, or to customize the internal set-up of the container by offering some extra bind mounts, etc.. That's related to ~/.toolboxrc.

I see that ~/.toolboxrc is only about changing the name of the default container and the image, and is a shell snippet that's easily sourced into another shell script.

Even if we come up with a different format for the configuration file going forward, I could imagine some migration path where if only ~/.toolboxrc is present, then it's used, but if the new-style file is present then it's preferred. Maybe we could also add a deprecation warning to urge users to migrate.

@debarshiray
Copy link
Member

What did you mean by retaining the same behaviour as coreos/toolbox? I suppose, it's about the command line UI, right? Our solution might depend on how smooth we want the migration to be.

It will be helpful if someone who had been extensively using coreos/toolbox so far, tried out containers/toolbox to see if there's something that's actively preventing any of the previous use-cases. It will help us understand the scope of the work better.

There could be different ways in which we can ensure compatibility with the command line UI. We could either teach containers/toolbox to accept a different UI, or we could ship a wrapper shell script in RHCOS that accepts the old command line UI and translates it into container/toolbox calls. Or something else.

@HarryMichal
Copy link
Member

I second the questions raised by @debarshiray. I'm just wondering if adding support for ~/.toolboxrc is really needed. The config file is used to set the registry, from which the image is pulled, the image itself and the name of the container, right? All these options can be set through the Toolbox CLI. Example:

$ toolbox create -i registry.redhat.io/rhel8/support-tools my-toolbox-name

I'm aware of the fact that it is not wise to just remove a feature (even though it is replaced by an adequate substitution) without giving any deprecation warning to users.

I just remembered one thing that will most likely need a mention in the documentation. I believe coreos/toolbox calls podman with sudo under the hood by default, right? Well, Toolbox works by default unprivileged. It can be run in a "privileged mode" when executed with sudo. This needs to be clear to the users because the difference is quite big when it comes to debugging of the host system.

I'll return again to the first paragraph. Even though I don't think we don't need a configuration file for setting those 3 environmental variables, I think we do need a configuration file. I already have an idea how can that be implemented but that is for a different ticket.

@miabbott
Copy link
Author

What did you mean by retaining the same behaviour as coreos/toolbox? I suppose, it's about the command line UI, right? Our solution might depend on how smooth we want the migration to be.

There could be different ways in which we can ensure compatibility with the command line UI. We could either teach containers/toolbox to accept a different UI, or we could ship a wrapper shell script in RHCOS that accepts the old command line UI and translates it into container/toolbox calls. Or something else.

Maybe we could also add a deprecation warning to urge users to migrate.

My main thought was around the various volume mounts and options that are passed to podman by the coreos/toolbox script - https://github.com/coreos/toolbox/blob/master/rhcos-toolbox#L88-L111. Though retaining a familiar CLI UX would be nice.

I think the idea of a wrapper script that calls out to containers/toobox with the correct options might be the smoothest path forward; we could even have it print a message about deprecation.

An example user story:

I am a customer that installed OCP 4.5. During the operation of my cluster, I have customized my toolbox experience via the .toolboxrc file. I've upgraded my OCP cluster to 4.8, where containers/toolbox is now provided. I want my experience with toolbox to seem as familiar as possible and reduce any intervention that is needed to preserve my existing experience.

I just remembered one thing that will most likely need a mention in the documentation. I believe coreos/toolbox calls podman with sudo under the hood by default, right? Well, Toolbox works by default unprivileged. It can be run in a "privileged mode" when executed with sudo. This needs to be clear to the users because the difference is quite big when it comes to debugging of the host system.

Right, this gets to the crux of my concern about moving RHCOS to include containers/toolbox. Users of coreos/toolbox are typically running things like strace, tcpdump, sosreport, etc trying to gather debug information about processes running on the host. I want to try to make the transition to containers/toolbox as smooth as possible for these existing users.

@miabbott
Copy link
Author

cc: @travier @control-d

@bgilbert
Copy link

It's worth noting that containers/toolbox has been broken on FCOS for some time; see coreos/fedora-coreos-tracker#693 and coreos/fedora-coreos-tracker#734.

@HarryMichal HarryMichal added 2. Design Design discussion 5. Help Wanted Extra attention is needed labels Feb 17, 2021
@travier
Copy link
Member

travier commented Feb 25, 2021

Also submitted #713

@debarshiray
Copy link
Member

debarshiray commented Jul 4, 2021

debarshiray pushed a commit to HarryMichal/toolbox that referenced this issue Jul 4, 2021
This is meant to get a better understanding of a failed 'podman pull'
invocation to understand whether pulling an image requires logging into
the registry or not. Currently, 'podman pull' doesn't have a dedicated
exit code to denote authorization errors, so this is meant to be a
temporary workaround for that.

Parsing the error stream is inherently fragile and tricky because
there's no guarantee that the structure of the messages won't change,
and there's no clear definition of how the messages are laid out.
Therefore, this approach can't be treated as a generic solution for
getting detailed information about failed Podman invocations.

The error stream is used not only for dumping error messages, but also
for showing progress bars. Therefore, all lines are skipped until one
that starts with "Error: " is found. This is a heuristic based on how
Go programs written using the Cobra [1] library tend to report errors.

All subsequent lines are taken together and split around the ": "
sub-string, on the assumption that the ": " sub-string is used when a
new error message is prefixed to an inner error. Each sub-string
created from the split is treated as a potential member of the chain of
errors reported within Podman.

Some real world examples of the 'podman pull' error stream in the case
of authorization errors are:

  * With Docker Hub (https://hub.docker.com/):
    Trying to pull docker.io/library/foobar:latest...
    Error: Error initializing source docker://foobar:latest: Error reading manifest latest in docker.io/library/foobar: errors:
    denied: requested access to the resource is denied
    unauthorized: authentication required

  * With registry.redhat.io:
    Trying to pull registry.redhat.io/foobar:latest...
    Error: Error initializing source docker://registry.redhat.io/foobar:latest: unable to retrieve auth token: invalid username/password: unauthorized: Please login to the Red Hat Registry using your Customer Portal credentials. Further instructions can be found here: https://access.redhat.com/RegistryAuthentication

[1] https://github.com/spf13/cobra/
    https://pkg.go.dev/github.com/spf13/cobra

containers#689
containers#786
containers#787
debarshiray pushed a commit to HarryMichal/toolbox that referenced this issue Jul 4, 2021
This way the standard error stream of the spawned binaries can be
inspected to get a better understanding of the failure, while still
being shown to the user when run with the '--verbose' flag.

Unfortunately, this breaks the progress bar in 'podman pull' because
the standard error stream is no longer connected to a file descriptor
that's a terminal device.

containers#689
containers#787
containers#823
debarshiray pushed a commit to HarryMichal/toolbox that referenced this issue Jul 4, 2021
debarshiray added a commit to HarryMichal/toolbox that referenced this issue Jul 4, 2021
A subsequent commit will use this when retrying the 'podman pull'
after logging the user into the registry for images that require
authorization.

containers#689
containers#787
debarshiray pushed a commit to HarryMichal/toolbox that referenced this issue Jul 4, 2021
Some registries contain private repositories of images and require the
user to log in first to gain access. With this Toolbox tries to
recognize errors when pulling images and offers the user the means to
log in.

Some changes by Debarshi Ray.

containers#689
containers#787
debarshiray pushed a commit to HarryMichal/toolbox that referenced this issue Jul 4, 2021
Testing of this feature is tricky as it relies on network which is
inherently flaky. So, as part of the setup two local image registries
are created and these features can be tested against them. To prevent
the removal of the registries during testing a different root for Podman
is used. The only tested registry implementation is Docker[0].

[0] https://hub.docker.com/_/registry/

containers#689
containers#787
debarshiray pushed a commit to HarryMichal/toolbox that referenced this issue Jul 4, 2021
The new additions to the system tests makes them take longer to run,
and was causing them to timeout.

containers#689
containers#787
@travier
Copy link
Member

travier commented Jul 12, 2021

From 6c86cab, will this still ask for container creation first time?

@travier
Copy link
Member

travier commented Jul 12, 2021

For the configuration file, having the classic pattern of: /usr/lib/toolbox.conf -> /etc/toolbox.conf -> $XDG_CONFIG_DIR/toolbox.conf (all previous 3 in a good format (TOML?)) -> ~/.toolbox.rc (compatibility fallback with support for only a restricted set of options) would enable shipping the /usr config from a package, having installer specified option in /etc, and user ones in $HOME.

@travier
Copy link
Member

travier commented Jul 12, 2021

The only supported options from the legacy toolbox.rc file should be:

REGISTRY=my.special.registry.example.com
IMAGE=debug:latest
TOOLBOX_NAME=special-debug-container"

From https://github.com/coreos/toolbox/blob/main/rhcos-toolbox#L157..L159

@travier
Copy link
Member

travier commented Jul 12, 2021

The new format could have the following options/format:

[root]
interactive = false
default_image = registry.foo/name/image:tag
credentials = /var/lib/foo.conf

[user]
interactive = true
default_image = registry.foo/name/image_user:tag

@cgwalters
Copy link
Collaborator

It seems nice if we support all of that but it doesn't seem like a hard blocker to me, we could just carry a bit of code that does e.g.:
if test -f ~/.toolbox.rc; then echo "note: The implementation of this program is replaced and ~/.toolbox.rc is no longer honored"; fi or so.

@debarshiray
Copy link
Member

From 6c86cab, will this still ask for container creation first time?

Yes, it will.

It can be overridden by the --assumeyes or -y command line option. Or we can add a configuration file option for it.

@debarshiray
Copy link
Member

Support for configuration files was added through PRs #828 and #851

For the configuration file, having the classic pattern of:
/usr/lib/toolbox.conf -> /etc/toolbox.conf -> $XDG_CONFIG_DIR/toolbox.conf
(all previous 3 in a good format (TOML?)) -> ~/.toolbox.rc
(compatibility fallback with support for only a restricted set of options) would
enable shipping the /usr config from a package, having installer specified
option in /etc, and user ones in $HOME.

Currently, it does /etc/containers/toolbox.conf -> $XDG_CONFIG_DIR/containers/toolbox.conf for no other reason than to stick with the pattern used by the rest of the code from the GitHub Containers organization.

However, I do like the idea of putting OS controlled configuration in /usr/lib for stateless operating systems, and I found myself wondering why /usr/lib/containers isn't already a thing today. Do you know why? Just curious. Unless there's a good reason not to, I think we can start using /usr/lib/containers.

The format is TOML, yes. :)

However, it doesn't support the ~/.toolboxrc used by coreos/toolbox. We were told that exact 1:1 backwards compatibility isn't necessary, at least not for Red Hat Enterprise Linux.

@debarshiray
Copy link
Member

#827 bind mounts the entire / from the host under /run/host, instead of separately putting /etc, /run, /tmp and /var in `/run/host.

However, we don't set the HOST environment variable inside the Toolbox container because it's such a terribly common name that it doesn't seem sane to use it without a namespace. However, that can be trivially patched in downstream, if necessary.

I think a better long-term option would be to teach tools (eg., sos) to natively work with Toolbox containers by looking for the /run/.toolboxenv file.

@debarshiray
Copy link
Member

Is there anything else? :)

HarryMichal added a commit to HarryMichal/toolbox that referenced this issue Nov 25, 2021
The option accepts a path to a file that is passed to an internal call
to 'podman pull' via the '--authfile' option. This will make it easier
to pull images from registries with authentication in-place.

The commit repurposes the test work I did for a rejected feature on
registry authentication. A registry is started locally and the use of
auth file is tested against it.

Fixes containers#689

containers#935
HarryMichal added a commit to HarryMichal/toolbox that referenced this issue Nov 30, 2021
The option accepts a path to a file that is passed to an internal call
to 'podman pull' via the '--authfile' option. This will make it easier
to pull images from registries with authentication in-place.

Fixes containers#689

containers#935
HarryMichal added a commit to HarryMichal/toolbox that referenced this issue Nov 30, 2021
The option accepts a path to a file that is passed to an internal call
to 'podman pull' via the '--authfile' option. This will make it easier
to pull images from registries with authentication in-place.

Fixes containers#689

containers#935
HarryMichal added a commit to HarryMichal/toolbox that referenced this issue Jan 9, 2022
The option accepts a path to a file that is passed to an internal call
to 'podman pull' via the '--authfile' option. This will make it easier
to pull images from registries with authentication in-place.

Fixes containers#689

containers#935
HarryMichal added a commit to HarryMichal/toolbox that referenced this issue Jan 13, 2022
The option accepts a path to a file that is passed to an internal call
to 'podman pull' via the '--authfile' option. This will make it easier
to pull images from registries with authentication in-place.

Fixes containers#689

containers#935
debarshiray pushed a commit to HarryMichal/toolbox that referenced this issue Jan 14, 2022
The option accepts a path to a file that is passed to an internal call
to 'podman pull' via the '--authfile' option. This will make it easier
to pull images from registries with authentication in-place.

Fixes containers#689

containers#935
@debarshiray
Copy link
Member

Whoops indeed, that's it (from the older toolbox:
https://github.com/coreos/toolbox/blob/main/rhcos-toolbox#L75).

#935 adds support for toolbox create --authfile .... Does it work for you?

Next step would be to add a configuration file knob.

HarryMichal added a commit to HarryMichal/toolbox that referenced this issue Jan 14, 2022
The option accepts a path to a file that is passed to an internal call
to 'podman pull' via the '--authfile' option. This will make it easier
to pull images from registries with authentication in-place.

Fixes containers#689

containers#935
HarryMichal added a commit to HarryMichal/toolbox that referenced this issue Jan 14, 2022
The option accepts a path to a file that is passed to an internal call
to 'podman pull' via the '--authfile' option. This will make it easier
to pull images from registries with authentication in-place.

Fixes containers#689

containers#935
HarryMichal added a commit to HarryMichal/toolbox that referenced this issue Jan 14, 2022
The option accepts a path to a file that is passed to an internal call
to 'podman pull' via the '--authfile' option. This will make it easier
to pull images from registries with authentication in-place.

Fixes containers#689

containers#935
HarryMichal added a commit to HarryMichal/toolbox that referenced this issue Jan 15, 2022
The option accepts a path to a file that is passed to an internal call
to 'podman pull' via the '--authfile' option. This will make it easier
to pull images from registries with authentication in-place.

Fixes containers#689

containers#935
HarryMichal added a commit to HarryMichal/toolbox that referenced this issue Jan 15, 2022
The option accepts a path to a file that is passed to an internal call
to 'podman pull' via the '--authfile' option. This will make it easier
to pull images from registries with authentication in-place.

Fixes containers#689

containers#935
HarryMichal added a commit to HarryMichal/toolbox that referenced this issue Feb 21, 2022
The option accepts a path to a file that is passed to an internal call
to 'podman pull' via the '--authfile' option. This will make it easier
to pull images from registries with authentication in-place.

Fixes containers#689

containers#935
HarryMichal added a commit to HarryMichal/toolbox that referenced this issue Feb 21, 2022
The option accepts a path to a file that is passed to an internal call
to 'podman pull' via the '--authfile' option. This will make it easier
to pull images from registries with authentication in-place.

Slightly adjusts the format of system tests README.

Fixes containers#689

containers#935
HarryMichal added a commit to HarryMichal/toolbox that referenced this issue Feb 21, 2022
The option accepts a path to a file that is passed to an internal call
to 'podman pull' via the '--authfile' option. This will make it easier
to pull images from registries with authentication in-place.

Slightly adjusts the format of system tests README.

Fixes containers#689

containers#935
HarryMichal added a commit to HarryMichal/toolbox that referenced this issue Feb 21, 2022
The option accepts a path to a file that is passed to an internal call
to 'podman pull' via the '--authfile' option. This will make it easier
to pull images from registries with authentication in-place.

Slightly adjusts the format of system tests README.

Fixes containers#689

containers#935
HarryMichal added a commit to HarryMichal/toolbox that referenced this issue Feb 22, 2022
The option accepts a path to a file that is passed to an internal call
to 'podman pull' via the '--authfile' option. This will make it easier
to pull images from registries with authentication in-place.

Slightly adjusts the format of system tests README.

Fixes containers#689

containers#935
HarryMichal added a commit to HarryMichal/toolbox that referenced this issue Mar 18, 2022
The option accepts a path to a file that is passed to an internal call
to 'podman pull' via the '--authfile' option. This will make it easier
to pull images from registries with authentication in-place.

Slightly adjusts the format of system tests README.

Fixes containers#689

containers#935
HarryMichal added a commit to HarryMichal/toolbox that referenced this issue Mar 18, 2022
The option accepts a path to a file that is passed to an internal call
to 'podman pull' via the '--authfile' option. This will make it easier
to pull images from registries with authentication in-place.

Slightly adjusts the format of system tests README.

Fixes containers#689

containers#935
HarryMichal added a commit to HarryMichal/toolbox that referenced this issue Mar 20, 2022
The option accepts a path to a file that is passed to an internal call
to 'podman pull' via the '--authfile' option. This will make it easier
to pull images from registries with authentication in-place.

Slightly adjusts the format of system tests README.

Fixes containers#689

containers#935
HarryMichal added a commit to HarryMichal/toolbox that referenced this issue Mar 20, 2022
The option accepts a path to a file that is passed to an internal call
to 'podman pull' via the '--authfile' option. This will make it easier
to pull images from registries with authentication in-place.

Fixes containers#689

containers#935
HarryMichal added a commit to HarryMichal/toolbox that referenced this issue Mar 20, 2022
The option accepts a path to a file that is passed to an internal call
to 'podman pull' via the '--authfile' option. This will make it easier
to pull images from registries with authentication in-place.

Fixes containers#689

containers#935
HarryMichal added a commit that referenced this issue Mar 20, 2022
The option accepts a path to a file that is passed to an internal call
to 'podman pull' via the '--authfile' option. This will make it easier
to pull images from registries with authentication in-place.

Fixes #689

#935
@travier
Copy link
Member

travier commented Mar 29, 2023

Do we have a way to specify the default image to pull in toolbox via a configuration file?

If we don't have that yet, we'll need to re-open this issue as we need that for OCP/OKD where we use a different image than the default.

@debarshiray
Copy link
Member

Do we have a way to specify the default image to pull in toolbox via a configuration file?

Yes, we do! We added it two years ago because RHEL 8 and 9 also overrides the default image. :)

See:
https://github.com/containers/toolbox/blob/main/data/config/toolbox.conf
https://github.com/containers/toolbox/blob/main/doc/toolbox.conf.5.md
https://src.fedoraproject.org/rpms/toolbox/blob/rawhide/f/toolbox.conf

If we don't have that yet, we'll need to re-open this issue as we need that for OCP/OKD where we use a different image than the default.

Interesting. Is that image public? I am curious to see how well Toolbx works with it.

@control-d
Copy link

Interesting. Is that image public? I am curious to see how well Toolbx works with it.

The image is registry.redhat.io/rhel9/support-tools
It expects to be run with the following podman options:

    --hostname toolbox \
    --name "${TOOLBOX_NAME}" \
    --privileged \
    --net=host \
    --pid=host \
    --ipc=host \
    --tty \
    --interactive \
    --env HOST=/host \
    --env NAME="${TOOLBOX_NAME}" \
    --env IMAGE="${IMAGE}" \
    --security-opt label=disable \
    --volume /run:/run \
    --volume /var/log:/var/log \
    --volume /etc/machine-id:/etc/machine-id \
    --volume /etc/localtime:/etc/localtime \
    --volume /:/host \

There would also need to be some way to force it to run as root when someone accidentally runs it as the core user.

@debarshiray
Copy link
Member

debarshiray commented Oct 2, 2023

The image is registry.redhat.io/rhel9/support-tools It expects
to be run with the following podman options:

Speaking from the Red Hat Enterprise Linux perspective, the support-tools image will need to be updated for RHEL 10 because the toolbox RPM will no longer set the HOST environment variable or try to replicate the command line interface from http://github.com/coreos/toolbox

For details, see:
https://src.fedoraproject.org/rpms/toolbox/c/b6101bf73fdcdc7e0e8ff2d75175a52e

Just to be sure, these changes are only for RHEL 10 onwards. RHELs 8 and 9 will continue to set the HOST environment variable and replicate the command line interface from http://github.com/coreos/toolbox

I have filed a SoS pull request to remove the need for setting the HOST environment variable inside Toolbx containers: sosreport/sos#3370

I don't know how you want to handle this for Red Hat OpenShift Container Platform. Currently, toolbox RPMs in RHEL and RHOCP are totally different. So, I suppose there's some flexibility.

    --hostname toolbox \
    --name "${TOOLBOX_NAME}" \
    --privileged \
    --net=host \
    --pid=host \
    --ipc=host \
    --tty \
    --interactive \
    --env HOST=/host \

Hopefully it won't be necessary to set HOST in future. See above.

--env NAME="${TOOLBOX_NAME}" \
--env IMAGE="${IMAGE}" \

Is it really necessary to set the NAME and IMAGE environment variables? It's not good for Toolbx to set such generically named variables without any namespacing. We used to have a few such variables that would leak into the containers and they were a constant source of complaints until @travier worked out a fix.

Toolbx containers expose the same information in the /run/.containerenv file. The format is a newline-separated list of environment-like shell-compatible variable assignments. You can source the file and look at the name and image variables.

--security-opt label=disable \
--volume /run:/run \

Bind mounting the entire /run like this is a bad idea because it will leak files like /run/.containerenv into the host.

--volume /var/log:/var/log \

Toolbx containers have /var/log/journal bind mounted from the host. Do you really need the entire /var/log from the host? Or a few more sub-directories?

--volume /etc/machine-id:/etc/machine-id \
--volume /etc/localtime:/etc/localtime \

We can't bind mount /etc/localtime like this. Otherwise, it won't stay synchronized with timezone changes on the host. Toolbx does this with a symbolic link that's kept updated.

--volume /:/host \

Toolbx puts this in /run/host like Flatpak.

There would also need to be some way to force it to run as root when someone accidentally runs it as the core user.

This sounds like something that needs to be handled outside Toolbx proper? The easiest solution might be to make the toolbox(1) binary setuid root on RHOCP? Sounds scary. :)

@debarshiray
Copy link
Member

Anyway, at this point, I think we need to let this issue be. :)

It started out as somewhat generic, and was later scoped down to the toolbox create --authfile option and the toolbox.conf(5) configuration file. Both have been in place since March 2022.

Let's continue to discuss downstream specific issues in their respective venues, and feel free to file new issues or pull requests for any new upstream work that needs to be done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1. Feature request A request for a new feature 2. Design Design discussion 5. Help Wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants