uid/gid drift possible when switching between different host builds #673

ggiguash · 2024-07-11T07:32:32Z

When upgrading from an ostree to a bootc image, the SSH key file permissions may get invalidated in the /etc/ssh directory. This is causing SSH service to fail its startup, effectively preventing the user from remotely accessing the host following the upgrade.

(edit by @cgwalters) Originally: https://issues.redhat.com/browse/USHIFT-3749

The text was updated successfully, but these errors were encountered:

ggiguash · 2024-07-11T10:49:21Z

The following work around can be implemented in a bootc container file to address this issue.

# Implement workarounds necessary for the successful upgrade from an ostree to a bootc image.
# - The permissions of SSH key files at /etc/ssh directory need to be fixed
#   so that they are owned by the 'ssh_keys' group.
RUN mkdir -p /etc/systemd/system/sshd.service.d && \
    printf "\
[Service]\n\
ExecStartPre=/bin/sh -c 'chgrp ssh_keys /etc/ssh/ssh*key*'\n" > "/etc/systemd/system/sshd.service.d/ssh-key-permissions.conf"

travier · 2024-07-11T13:46:13Z

You will likely hit the following once you rebase to a CentOS Stream/RHEL version with this openssh change:

cgwalters · 2024-07-11T15:18:07Z

( I've added a link to https://issues.redhat.com/browse/USHIFT-3749 where this was discussed originally to avoid duplicate discussion where relevant)

But again, this issue is not about "ostree vs bootc", it's specially about "ostree commits as built by RHEL for Edge" vs "fedora/c9s/rhel derived bootc" which happen to allocate different uids/gids due to buildsystem drift.

It could also occur when switching between bootc images that have different defined ssh_key gids, and it could also occur when switching between ostree images (again if the buildsystem generating those hasn't ensured they're the same).

Because of this, I don't think this is really a bootc issue at a technical level. It's more a higher level buildsystem issue.

I would accept though that in practice, it may make sense for us to at least track the overall issue here, but...if we were to do something like that ssh key chown hack, I would probably say the code would logically be owned by e.g. https://gitlab.com/fedora/bootc/base-images

Except for the fact that the world does not need more load bearing bash scripts run as unrestricted root, and maybe we actually implement it in Rust here as an opt-in...similarly to coreos/rpm-ostree#4913 in many ways.

ggiguash · 2024-07-11T15:48:01Z

It could also occur when switching between bootc images that have different defined ssh_key gids, and it could also
occur when switching between ostree images (again if the buildsystem generating those hasn't ensured they're the
same).

The problem can certainly occur when switching between images of the same type, but only in the case when different image parents are used for the base image vs update image. To mitigate this issue, the way we usually build our image hierarchies is to have a base (parent) image and then derive all the updates from it.

The ostree-to-bootc upgrade is a special case because we're guaranteed to have different parents, so it's a 100% failure.
Thus, I thought to emphasize the need to provide a solution for it.

While I prefer this to be handled by the base images we produce, the fallback would be to document all the known pitfalls, which goes back to the #671 issue.

P.S. The title of this issue was changed to be more generic and we may want to go back to the specific problem of ostree-to-bootc upgrade because the generic one is a "known" issue.

cgwalters · 2024-07-11T16:32:06Z

The ostree-to-bootc upgrade is a special case because we're guaranteed to have different parents

No - it's equally possible to build a custom base image using the same buildsystem that produces ostree commits today. In fact, there's significant tooling to streamline exactly that process, such as e.g. rpm-ostree compose container-encapsulate. It's exactly how Fedora CoreOS works today...an ostree commit is generated, and then turned into a container image.

ggiguash · 2024-07-11T19:40:39Z

it's equally possible to build a custom base image using the same buildsystem that produces ostree commits today.

I did not know this tooling was in place. I will investigate this direction asap.
Based on this discussion, I agree that we are facing a generic uid/gid mismatch issue, thank you for clarifying it.

ggiguash · 2024-07-15T16:56:23Z

Further investigation on installing bootc images on top of an unrelated ostree commit led me to a dead end. The problem with the uid/gid is serious. In my case, I got openvswitch and hugetlbfs users/groups overwritten after deploying a bootc image. This makes quite a few services fail on boot and it practically renders the system unusable with no easy way of remediation.

Next, I followed up on using the rpm-ostree compose container-encapsulate command to create a base container image that can be used for bootc container builds and deployment. I completed a full cycle of creating ostree-base-container, derived bootc-container and installing the latter on top of the ostree system. This worked very well - no surprises.

I did encounter a few small issues that might be worth mentioning in the upgrade instructions:

The base ostree image should include bootc, dnf and subscription-manager RPM packages
- bootc is required for switching the image on an ostree system using bootc toolchain
- dnf is required because many container build procedures use it and it does not come packaged by default with ostree image
- subscription-manager is required to query/configure RHSM during the container builds or in runtime
The following workaround was required in the derived bootc container image build to enable host subscriptions and create missing /opt directory

# Work around the missing ostree-container functionality:
# - Enable librhsm which enables host subscriptions to work in containers
# - Image has /opt symlink pointing to a non-existent /var/opt, so create the latter
RUN ln -sr /run/secrets/etc-pki-entitlement /etc/pki/entitlement-host && \
    ln -sr /run/secrets/rhsm /etc/rhsm-host && \
    mkdir -p /var/opt

ggiguash · 2024-07-15T16:58:32Z

@cgwalters , based on my recent experience described in the previous comment, I wonder if we should be stricter in our instructions for ostree-to-bootc upgrade? I mean, should we state that the upgrade is likely to fail unless the bootc container image is derived from the base ostree commit?

cgwalters · 2024-07-15T21:50:34Z

The problem with the uid/gid is serious. In my case, I got openvswitch and hugetlbfs users/groups overwritten after deploying a bootc image.

Yeah, it is a really annoying problem. See also openshift/os#1318 where we worked around this just in RHCOS...and that code isn't inherited elsewhere right.

IMO openvswitch especially is a picture perfect use case for systemd DynamicUser=yes too...

ci: Add cross-arch builds

Import generic prep

cgwalters · 2024-11-22T14:03:08Z

Since the ssh_key group is by far the most visible instance of this, it's worth noting that for fedora-bootc its definition is here:
https://gitlab.com/fedora/bootc/base-images/-/blame/main/tier-0/group?ref_type=heads#L26
(Which honestly, is just a fork of FCOS, which is itself a fork of Atomic Host...)

But what travier said is also really important to emphasize - in current Fedora and C10S I think we can just drop the ssh_keys group from the base image.

cgwalters · 2024-11-22T14:39:12Z

Let's again narrow in on openvswitch specifically, which has a floating user. Now, it does today have a sysusers.d fragment, but when installed in a container layer, its RPM script ends up creating the user.

However, openvswitch also has a specific problem in that /etc/openvswitch is owned by that uid+gid - so we cannot create it via sysusers because that's "static" content in the image today.

The most direct hack for the openvswitch case today is (at build time) is basically adding a systemd unit that does ExecStartPre=chown -R openvswitch:openvswitch /etc/openvswitch.

But what would be better here...is to not have the user at build time! We should have tooling to "flush" /etc/passwd and /etc/group - basically for each user/group that has a sysusers.d fragment, we undo those changes. This tooling needs to error out if there's any files in the image owned by that uid/gid.

However, even better again than all of that for openvswitch would be DynamicUser=yes - the default config file for openvswitch /etc/openvswitch/default.conf could be owned by root. It looks to me like the rest of the stuff could live in StateDirectory=.

cgwalters · 2024-11-22T14:40:27Z

Related to this, cockpit just switched to DynamicUser=yes for similar reasons.

cgwalters · 2024-11-22T14:55:56Z

I filed https://issues.redhat.com/browse/RHEL-68655 to engage the openvswitch maintainers.

runcom · 2024-11-25T11:22:25Z

in current Fedora and C10S I think we can just drop the ssh_keys group from the base image.

@cgwalters to go ahead with this, at least from the ssh pov, should we just remove the group on the base image and rebuild it then?

UPDATE: see https://gitlab.com/fedora/bootc/base-images/-/merge_requests/66

~~For openvswitch maybe we should engage with them on the topic of using DynamicUser~~

jlebon · 2024-11-26T20:37:54Z

Hitting this also in the context of openshift/enhancements#1637, where I'm converting the OCP node image to become a layered image on top of an RHCOS base (instead of being part of the initial compose).

What's interesting here is that drift doesn't just affect users/groups defined in the layered portion, but any dynamically allocated uids/gids in the base compose. Look at e.g. the diff for /usr/lib/group:

diff --git a/unlayered/usr/lib/group b/layered/usr/lib/group
index ac34a57..7db8e7b 100644
--- a/unlayered/usr/lib/group
+++ b/layered/usr/lib/group
@@ -43,14 +43,11 @@ sshd:x:74:
 chrony:x:992:
-clevis:x:793:
+clevis:x:795:
-containers:x:792:
-dnsmasq:x:791:
+dnsmasq:x:794:
-gluster:x:794:
+gluster:x:796:
 hugetlbfs:x:801:openvswitch
-input:x:798:
 kvm:x:36:
 openvswitch:x:800:
-printadmin:x:795:
+printadmin:x:797:
-render:x:797:
+render:x:799:
-systemd-coredump:x:796:
+systemd-coredump:x:798:
-systemd-journal-remote:x:790:
+systemd-journal-remote:x:793:
-unbound:x:799:

E.g. clevis, dnsmasq, etc... these are all from base packages (which yeah, /usr/lib/group is populated by rpm-ostree and would normally not be changed during layering operations). But because OCP packages are no longer part of the base compose, the way the GIDs are distributed are different even in the base image. This I think is a surprising gotcha.

We basically need a way to (1) at base compose time, freeze those GIDs to what they were in previous composes, and (2) at layering time, force the GIDs of layered packages to match those they previously had when they were part of the base compose. Similarly for UIDs.

cgwalters · 2024-11-26T21:17:48Z

Can you link the code for this?

jlebon · 2024-11-26T22:23:38Z

Not much code. Just https://github.com/openshift/os/blob/master/Containerfile, which runs https://github.com/openshift/os/blob/master/scripts/apply-manifest, which applies https://github.com/openshift/os/blob/master/packages-openshift.yaml. Basically just a wrapper around rpm-ostree install (though locally I've already switched that over to dnf install) and running the postprocess scripts.

cgwalters · 2024-11-27T13:28:05Z

Hmmm, https://github.com/openshift/os/blob/ce4da959eb3fc9477a0a96b69e6efd084a69b9ac/packages-openshift.yaml#L61 is in the mix here too.

say-paul · 2024-11-27T14:32:30Z

I have PR: https://gitlab.com/fedora/bootc/base-images/-/merge_requests/67
to propose the idea of tackling uid/gid shift post-upgrade. Wanted to share this quickly here to get some early feedback and if this approach makes sense.

ggiguash · 2024-11-28T06:18:22Z

Re: https://gitlab.com/fedora/bootc/base-images/-/merge_requests/67 PR

This approach makes sense. Can you explain how it addresses the case when the users / groups "disappear" in the new commit? I think we might want to extend the script to check for users / groups in 2 places: current commit and new commit.

say-paul · 2024-11-28T14:30:41Z

Can you explain how it addresses the case when the users / groups "disappear" in the new commit?

I considered only the shift in uids/gids, but as per the disappearance use-case- I think that while scanning through all the files we can create them when we encounter files owned by user/group not present in the list.

extend the script to check for users/groups in 2 places: current commit and new commit.

Comparing the files from two deployments and setting merge logic can work too, I guess it needs a bit of thought on how to
implement rollback. I will try that out, will keep posted.

User and group handling is a very messy topic and the split RHCOS effort runs right into some of the intricacies. In the layered node image model, a bunch of packages that previously were part of the base compose are now layered in a separate build step. Some of those packages also want to bring their own users/groups, such as `openvswitch`, `containers`, and `unbound`. Because they're no longer part of the base compose, the way UIDs and GIDs are allocated to dynamic system users changes, possibly shifting the IDs of multiple system users. Even for system users that don't actually have e.g. data in `/var`, we pretty much have to reserve their IDs they historically had so as to not create a "hole" in the range that could be filled by something which _does_ have data. This issue is in fact relevant even without the split RHCOS effort. Any system user dropped (or e.g. package that switches to `DynamicUser`) from the base compose can also create a hole, causing drift to occur for other system users. Anyway, this is obviously not a great position to be in, but we can't really have IDs drifting on client systems. So just pin all the currently dynamically allocated entries. Cross fingers on `DynamicUser` and systemd sysusers to save us before we run out of IDs... See also: coreos/fedora-coreos-tracker#155 See also: https://gitlab.com/fedora/bootc/tracker/-/issues/31 See also: containers/bootc#673

jlebon · 2024-11-29T18:30:11Z

Ended up just doing openshift/os@f202927 (#1661) for the RHCOS/OCP case.

say-paul · 2024-12-02T16:17:29Z

@ggiguash I have modified the approach which can restore deleted user/groups. I have identified 3 usecases that needs to be handled, if there are other-cases in you mind please let us know.

travier · 2024-12-02T18:13:43Z

For me the long term solution for this is coreos/fedora-coreos-tracker#1599 with the transition path in coreos/fedora-coreos-tracker#155 (comment).

In the meantime, I think what Jonathan did is the best approach: freeze the IDs in the image to a "known state".

ggiguash · 2024-12-02T20:20:51Z

For me the long term solution for this is coreos/fedora-coreos-tracker#1599 with the transition path in coreos/fedora-coreos-tracker#155 (comment).

In the meantime, I think what Jonathan did is the best approach: freeze the IDs in the image to a "known state".

@travier , do the above proposals address the issue of upgrading existing systems?

travier · 2024-12-03T09:19:20Z

It does in coreos/fedora-coreos-tracker#155 (comment).

cgwalters changed the title ~~SSH key permissions are broken when upgrading from an ostree to a bootc image~~ uid/gid drift possible when switching between different host builds Jul 11, 2024

cgwalters added the area/osintegration Relates to an external OS/distro base image label Jul 11, 2024

cgwalters added a commit to cgwalters/bootc that referenced this issue Nov 5, 2024

Merge pull request containers#673 from cgwalters/ci-arch

128bf86

ci: Add cross-arch builds

cgwalters added a commit to cgwalters/bootc that referenced this issue Nov 6, 2024

Merge pull request containers#673 from cgwalters/import-generic-prep

0c118cb

Import generic prep

LorbusChris mentioned this issue Nov 22, 2024

MCO-1331: Install extensions via Containerfile for OCL openshift/machine-config-operator#4705

Merged

travier mentioned this issue Dec 3, 2024

Move away from nss-altfiles (was: Messed up permissions in /var) fedora-silverblue/issue-tracker#362

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

uid/gid drift possible when switching between different host builds #673

uid/gid drift possible when switching between different host builds #673

ggiguash commented Jul 11, 2024 •

edited by cgwalters

Loading

ggiguash commented Jul 11, 2024

travier commented Jul 11, 2024

cgwalters commented Jul 11, 2024

ggiguash commented Jul 11, 2024 •

edited

Loading

cgwalters commented Jul 11, 2024

ggiguash commented Jul 11, 2024

ggiguash commented Jul 15, 2024 •

edited

Loading

ggiguash commented Jul 15, 2024

cgwalters commented Jul 15, 2024

cgwalters commented Nov 22, 2024

cgwalters commented Nov 22, 2024

cgwalters commented Nov 22, 2024

cgwalters commented Nov 22, 2024

runcom commented Nov 25, 2024 •

edited

Loading

jlebon commented Nov 26, 2024

cgwalters commented Nov 26, 2024

jlebon commented Nov 26, 2024

cgwalters commented Nov 27, 2024

say-paul commented Nov 27, 2024

ggiguash commented Nov 28, 2024

say-paul commented Nov 28, 2024 •

edited

Loading

jlebon commented Nov 29, 2024

say-paul commented Dec 2, 2024

travier commented Dec 2, 2024 •

edited

Loading

ggiguash commented Dec 2, 2024

travier commented Dec 3, 2024

uid/gid drift possible when switching between different host builds #673

uid/gid drift possible when switching between different host builds #673

Comments

ggiguash commented Jul 11, 2024 • edited by cgwalters Loading

ggiguash commented Jul 11, 2024

travier commented Jul 11, 2024

cgwalters commented Jul 11, 2024

ggiguash commented Jul 11, 2024 • edited Loading

cgwalters commented Jul 11, 2024

ggiguash commented Jul 11, 2024

ggiguash commented Jul 15, 2024 • edited Loading

ggiguash commented Jul 15, 2024

cgwalters commented Jul 15, 2024

cgwalters commented Nov 22, 2024

cgwalters commented Nov 22, 2024

cgwalters commented Nov 22, 2024

cgwalters commented Nov 22, 2024

runcom commented Nov 25, 2024 • edited Loading

jlebon commented Nov 26, 2024

cgwalters commented Nov 26, 2024

jlebon commented Nov 26, 2024

cgwalters commented Nov 27, 2024

say-paul commented Nov 27, 2024

ggiguash commented Nov 28, 2024

say-paul commented Nov 28, 2024 • edited Loading

jlebon commented Nov 29, 2024

say-paul commented Dec 2, 2024

travier commented Dec 2, 2024 • edited Loading

ggiguash commented Dec 2, 2024

travier commented Dec 3, 2024

ggiguash commented Jul 11, 2024 •

edited by cgwalters

Loading

ggiguash commented Jul 11, 2024 •

edited

Loading

ggiguash commented Jul 15, 2024 •

edited

Loading

runcom commented Nov 25, 2024 •

edited

Loading

say-paul commented Nov 28, 2024 •

edited

Loading

travier commented Dec 2, 2024 •

edited

Loading