Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uid/gid drift possible when switching between different host builds #673

Open
ggiguash opened this issue Jul 11, 2024 · 26 comments
Open

uid/gid drift possible when switching between different host builds #673

ggiguash opened this issue Jul 11, 2024 · 26 comments
Labels
area/osintegration Relates to an external OS/distro base image

Comments

@ggiguash
Copy link

ggiguash commented Jul 11, 2024

When upgrading from an ostree to a bootc image, the SSH key file permissions may get invalidated in the /etc/ssh directory. This is causing SSH service to fail its startup, effectively preventing the user from remotely accessing the host following the upgrade.

image

(edit by @cgwalters) Originally: https://issues.redhat.com/browse/USHIFT-3749

@ggiguash
Copy link
Author

The following work around can be implemented in a bootc container file to address this issue.

# Implement workarounds necessary for the successful upgrade from an ostree to a bootc image.
# - The permissions of SSH key files at /etc/ssh directory need to be fixed
#   so that they are owned by the 'ssh_keys' group.
RUN mkdir -p /etc/systemd/system/sshd.service.d && \
    printf "\
[Service]\n\
ExecStartPre=/bin/sh -c 'chgrp ssh_keys /etc/ssh/ssh*key*'\n" > "/etc/systemd/system/sshd.service.d/ssh-key-permissions.conf"

@travier
Copy link
Member

travier commented Jul 11, 2024

@cgwalters cgwalters changed the title SSH key permissions are broken when upgrading from an ostree to a bootc image uid/gid drift possible when switching between different host builds Jul 11, 2024
@cgwalters
Copy link
Collaborator

( I've added a link to https://issues.redhat.com/browse/USHIFT-3749 where this was discussed originally to avoid duplicate discussion where relevant)

But again, this issue is not about "ostree vs bootc", it's specially about "ostree commits as built by RHEL for Edge" vs "fedora/c9s/rhel derived bootc" which happen to allocate different uids/gids due to buildsystem drift.

It could also occur when switching between bootc images that have different defined ssh_key gids, and it could also occur when switching between ostree images (again if the buildsystem generating those hasn't ensured they're the same).

Because of this, I don't think this is really a bootc issue at a technical level. It's more a higher level buildsystem issue.

I would accept though that in practice, it may make sense for us to at least track the overall issue here, but...if we were to do something like that ssh key chown hack, I would probably say the code would logically be owned by e.g. https://gitlab.com/fedora/bootc/base-images

Except for the fact that the world does not need more load bearing bash scripts run as unrestricted root, and maybe we actually implement it in Rust here as an opt-in...similarly to coreos/rpm-ostree#4913 in many ways.

@ggiguash
Copy link
Author

ggiguash commented Jul 11, 2024

It could also occur when switching between bootc images that have different defined ssh_key gids, and it could also
occur when switching between ostree images (again if the buildsystem generating those hasn't ensured they're the
same).

The problem can certainly occur when switching between images of the same type, but only in the case when different image parents are used for the base image vs update image. To mitigate this issue, the way we usually build our image hierarchies is to have a base (parent) image and then derive all the updates from it.

The ostree-to-bootc upgrade is a special case because we're guaranteed to have different parents, so it's a 100% failure.
Thus, I thought to emphasize the need to provide a solution for it.

While I prefer this to be handled by the base images we produce, the fallback would be to document all the known pitfalls, which goes back to the #671 issue.

P.S. The title of this issue was changed to be more generic and we may want to go back to the specific problem of ostree-to-bootc upgrade because the generic one is a "known" issue.

@cgwalters cgwalters added the area/osintegration Relates to an external OS/distro base image label Jul 11, 2024
@cgwalters
Copy link
Collaborator

The ostree-to-bootc upgrade is a special case because we're guaranteed to have different parents

No - it's equally possible to build a custom base image using the same buildsystem that produces ostree commits today. In fact, there's significant tooling to streamline exactly that process, such as e.g. rpm-ostree compose container-encapsulate. It's exactly how Fedora CoreOS works today...an ostree commit is generated, and then turned into a container image.

@ggiguash
Copy link
Author

it's equally possible to build a custom base image using the same buildsystem that produces ostree commits today.

I did not know this tooling was in place. I will investigate this direction asap.
Based on this discussion, I agree that we are facing a generic uid/gid mismatch issue, thank you for clarifying it.

@ggiguash
Copy link
Author

ggiguash commented Jul 15, 2024

Further investigation on installing bootc images on top of an unrelated ostree commit led me to a dead end. The problem with the uid/gid is serious. In my case, I got openvswitch and hugetlbfs users/groups overwritten after deploying a bootc image. This makes quite a few services fail on boot and it practically renders the system unusable with no easy way of remediation.

Next, I followed up on using the rpm-ostree compose container-encapsulate command to create a base container image that can be used for bootc container builds and deployment. I completed a full cycle of creating ostree-base-container, derived bootc-container and installing the latter on top of the ostree system. This worked very well - no surprises.

I did encounter a few small issues that might be worth mentioning in the upgrade instructions:

  • The base ostree image should include bootc, dnf and subscription-manager RPM packages
    • bootc is required for switching the image on an ostree system using bootc toolchain
    • dnf is required because many container build procedures use it and it does not come packaged by default with ostree image
    • subscription-manager is required to query/configure RHSM during the container builds or in runtime
  • The following workaround was required in the derived bootc container image build to enable host subscriptions and create missing /opt directory
# Work around the missing ostree-container functionality:
# - Enable librhsm which enables host subscriptions to work in containers
# - Image has /opt symlink pointing to a non-existent /var/opt, so create the latter
RUN ln -sr /run/secrets/etc-pki-entitlement /etc/pki/entitlement-host && \
    ln -sr /run/secrets/rhsm /etc/rhsm-host && \
    mkdir -p /var/opt

@ggiguash
Copy link
Author

@cgwalters , based on my recent experience described in the previous comment, I wonder if we should be stricter in our instructions for ostree-to-bootc upgrade? I mean, should we state that the upgrade is likely to fail unless the bootc container image is derived from the base ostree commit?

@cgwalters
Copy link
Collaborator

The problem with the uid/gid is serious. In my case, I got openvswitch and hugetlbfs users/groups overwritten after deploying a bootc image.

Yeah, it is a really annoying problem. See also openshift/os#1318 where we worked around this just in RHCOS...and that code isn't inherited elsewhere right.

IMO openvswitch especially is a picture perfect use case for systemd DynamicUser=yes too...

cgwalters added a commit to cgwalters/bootc that referenced this issue Nov 5, 2024
cgwalters added a commit to cgwalters/bootc that referenced this issue Nov 6, 2024
@cgwalters
Copy link
Collaborator

Since the ssh_key group is by far the most visible instance of this, it's worth noting that for fedora-bootc its definition is here:
https://gitlab.com/fedora/bootc/base-images/-/blame/main/tier-0/group?ref_type=heads#L26
(Which honestly, is just a fork of FCOS, which is itself a fork of Atomic Host...)

But what travier said is also really important to emphasize - in current Fedora and C10S I think we can just drop the ssh_keys group from the base image.

@cgwalters
Copy link
Collaborator

Let's again narrow in on openvswitch specifically, which has a floating user. Now, it does today have a sysusers.d fragment, but when installed in a container layer, its RPM script ends up creating the user.

However, openvswitch also has a specific problem in that /etc/openvswitch is owned by that uid+gid - so we cannot create it via sysusers because that's "static" content in the image today.

The most direct hack for the openvswitch case today is (at build time) is basically adding a systemd unit that does ExecStartPre=chown -R openvswitch:openvswitch /etc/openvswitch.

But what would be better here...is to not have the user at build time! We should have tooling to "flush" /etc/passwd and /etc/group - basically for each user/group that has a sysusers.d fragment, we undo those changes. This tooling needs to error out if there's any files in the image owned by that uid/gid.

However, even better again than all of that for openvswitch would be DynamicUser=yes - the default config file for openvswitch /etc/openvswitch/default.conf could be owned by root. It looks to me like the rest of the stuff could live in StateDirectory=.

@cgwalters
Copy link
Collaborator

Related to this, cockpit just switched to DynamicUser=yes for similar reasons.

@cgwalters
Copy link
Collaborator

I filed https://issues.redhat.com/browse/RHEL-68655 to engage the openvswitch maintainers.

@runcom
Copy link
Member

runcom commented Nov 25, 2024

in current Fedora and C10S I think we can just drop the ssh_keys group from the base image.

@cgwalters to go ahead with this, at least from the ssh pov, should we just remove the group on the base image and rebuild it then?

UPDATE: see https://gitlab.com/fedora/bootc/base-images/-/merge_requests/66

For openvswitch maybe we should engage with them on the topic of using DynamicUser

@jlebon
Copy link
Contributor

jlebon commented Nov 26, 2024

Hitting this also in the context of openshift/enhancements#1637, where I'm converting the OCP node image to become a layered image on top of an RHCOS base (instead of being part of the initial compose).

What's interesting here is that drift doesn't just affect users/groups defined in the layered portion, but any dynamically allocated uids/gids in the base compose. Look at e.g. the diff for /usr/lib/group:

diff --git a/unlayered/usr/lib/group b/layered/usr/lib/group
index ac34a57..7db8e7b 100644
--- a/unlayered/usr/lib/group
+++ b/layered/usr/lib/group
@@ -43,14 +43,11 @@ sshd:x:74:
 chrony:x:992:
-clevis:x:793:
+clevis:x:795:
-containers:x:792:
-dnsmasq:x:791:
+dnsmasq:x:794:
-gluster:x:794:
+gluster:x:796:
 hugetlbfs:x:801:openvswitch
-input:x:798:
 kvm:x:36:
 openvswitch:x:800:
-printadmin:x:795:
+printadmin:x:797:
-render:x:797:
+render:x:799:
-systemd-coredump:x:796:
+systemd-coredump:x:798:
-systemd-journal-remote:x:790:
+systemd-journal-remote:x:793:
-unbound:x:799:

E.g. clevis, dnsmasq, etc... these are all from base packages (which yeah, /usr/lib/group is populated by rpm-ostree and would normally not be changed during layering operations). But because OCP packages are no longer part of the base compose, the way the GIDs are distributed are different even in the base image. This I think is a surprising gotcha.

We basically need a way to (1) at base compose time, freeze those GIDs to what they were in previous composes, and (2) at layering time, force the GIDs of layered packages to match those they previously had when they were part of the base compose. Similarly for UIDs.

@cgwalters
Copy link
Collaborator

Can you link the code for this?

@jlebon
Copy link
Contributor

jlebon commented Nov 26, 2024

Not much code. Just https://github.com/openshift/os/blob/master/Containerfile, which runs https://github.com/openshift/os/blob/master/scripts/apply-manifest, which applies https://github.com/openshift/os/blob/master/packages-openshift.yaml. Basically just a wrapper around rpm-ostree install (though locally I've already switched that over to dnf install) and running the postprocess scripts.

@cgwalters
Copy link
Collaborator

@say-paul
Copy link

I have PR: https://gitlab.com/fedora/bootc/base-images/-/merge_requests/67
to propose the idea of tackling uid/gid shift post-upgrade. Wanted to share this quickly here to get some early feedback and if this approach makes sense.

@ggiguash
Copy link
Author

Re: https://gitlab.com/fedora/bootc/base-images/-/merge_requests/67 PR

This approach makes sense. Can you explain how it addresses the case when the users / groups "disappear" in the new commit? I think we might want to extend the script to check for users / groups in 2 places: current commit and new commit.

@say-paul
Copy link

say-paul commented Nov 28, 2024

Can you explain how it addresses the case when the users / groups "disappear" in the new commit?

I considered only the shift in uids/gids, but as per the disappearance use-case- I think that while scanning through all the files we can create them when we encounter files owned by user/group not present in the list.

extend the script to check for users/groups in 2 places: current commit and new commit.

Comparing the files from two deployments and setting merge logic can work too, I guess it needs a bit of thought on how to
implement rollback. I will try that out, will keep posted.

jlebon added a commit to jlebon/os that referenced this issue Nov 29, 2024
User and group handling is a very messy topic and the split RHCOS effort
runs right into some of the intricacies.

In the layered node image model, a bunch of packages that previously
were part of the base compose are now layered in a separate build step.
Some of those packages also want to bring their own users/groups, such
as `openvswitch`, `containers`, and `unbound`.

Because they're no longer part of the base compose, the way UIDs and
GIDs are allocated to dynamic system users changes, possibly shifting
the IDs of multiple system users.

Even for system users that don't actually have e.g. data in `/var`, we
pretty much have to reserve their IDs they historically had so as to
not create a "hole" in the range that could be filled by something which
_does_ have data.

This issue is in fact relevant even without the split RHCOS effort. Any
system user dropped (or e.g. package that switches to `DynamicUser`)
from the base compose can also create a hole, causing drift to occur for
other system users.

Anyway, this is obviously not a great position to be in, but we
can't really have IDs drifting on client systems. So just pin all the
currently dynamically allocated entries.

Cross fingers on `DynamicUser` and systemd sysusers to save us before we
run out of IDs...

See also: coreos/fedora-coreos-tracker#155
See also: https://gitlab.com/fedora/bootc/tracker/-/issues/31
See also: containers/bootc#673
@jlebon
Copy link
Contributor

jlebon commented Nov 29, 2024

Ended up just doing openshift/os@f202927 (#1661) for the RHCOS/OCP case.

@say-paul
Copy link

say-paul commented Dec 2, 2024

@ggiguash I have modified the approach which can restore deleted user/groups. I have identified 3 usecases that needs to be handled, if there are other-cases in you mind please let us know.

@travier
Copy link
Member

travier commented Dec 2, 2024

For me the long term solution for this is coreos/fedora-coreos-tracker#1599 with the transition path in coreos/fedora-coreos-tracker#155 (comment).

In the meantime, I think what Jonathan did is the best approach: freeze the IDs in the image to a "known state".

@ggiguash
Copy link
Author

ggiguash commented Dec 2, 2024

For me the long term solution for this is coreos/fedora-coreos-tracker#1599 with the transition path in coreos/fedora-coreos-tracker#155 (comment).

In the meantime, I think what Jonathan did is the best approach: freeze the IDs in the image to a "known state".

@travier , do the above proposals address the issue of upgrading existing systems?

@travier
Copy link
Member

travier commented Dec 3, 2024

It does in coreos/fedora-coreos-tracker#155 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/osintegration Relates to an external OS/distro base image
Projects
None yet
Development

No branches or pull requests

6 participants