-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provision node fails with fcos image 33.20210301.3.1 - error setting value of extended attribute #566
Comments
Hitting the same on GCP using FCOS 33.20210301.3.1 Looks like something regressed in rpm-ostree:
mcd remains the same, so I don't think its MCD change causing this. cc'ing @cgwalters |
Exciting. So...that xattr comes from librepo; this code was recently changed in rpm-software-management/librepo@193c4fd I think indeed rpm-ostree should strip these xattrs when generating the extensions. But also the container stack should...well, it should probably ignore failures to set I think the problem here is that the release image download script is trying to write to |
A quick short term fix is probably a patch to coreos-assembler to strip all xattrs from content we write to the container image - there shouldn't be any. |
I am almost certain this relates to the version of rpm-ostree in coreos-assembler, not the version on the host. I'm not reproducing this locally, but I have a newer librepo. And so should the latest coreos-assembler. |
Hum...in the MCO the |
Right, that's expected - during initial install we start with plain FCOS with just |
OK yeah I'm pretty sure the bug here is around
So...try something like this in your build script:
|
There should probably be |
Blocked by another issue in this image - |
Is there any hope of a workaround for this? |
Workaround: start with 33.20210217.3.0. (see https://builds.coreos.fedoraproject.org/browser?stream=stable) |
I'm still seeing this error with both the stable (33.20210301.3.1) and "next" (33.20210315.1.0) releases.
Bootstrap fails to complete during the |
Manually adding
However, this only allows bootstrap to complete, the other nodes have the same issue and do not pivot. |
This is expected - the node starts with plain FCOS which has no |
🤔 I'm seeing some odd behavior then, very similar to the OP. These errors:
Eventually lead to this:
Having manually stepped through the Is there some additional information I can provide which will be helpful? |
Yup, you'd need to fall back to |
How to start with specific fcos image version? in machine-config I can see osImageURL, but it points to quay.io. How to match version? Or I just went a wrong way? |
Still no luck on @cgwalters any ideas we could try? |
(ideally I'd replace OS extensions with a fully-baked OKD image, but CI build system limitations are restricting us to osExtensions solution) |
Answering own question: need to edit Machine Set and replace spec.template.spec.disks.image from okd....rhos-image to projects/fedora-coreos-cloud/global/images/fedora-coreos-33-20210217-3-0-gcp-x86-64 |
How does one prevent the nodes from pivoting to a later version? I am installing from scratch on UPI bare metal, and I successfully boot the nodes to 33.20210217.3.0, but they always pivot to 33.20210328.3.0 and subsequently fail. The logs show an error in bulk copy, when Is there someplace where I can specify that I want the nodes to pivot to 33.20210217.3.0, rather than to the latest stable FCOS33? |
one way to avoid podman issues is to |
@tnozicka thanks for the comment, yes, crocking in But I'm especially asking about @vrutkovs's comment, because it suggests to me that there is a general mechanism for controlling pivots that I don't know about! |
The nodes are expected to update to this version.
After nodes are updated to the expected version they should no longer use podman and use |
@vrutkovs thanks for your help. I'm using a pull secret from cloud.openshift.com and openshift-install 4.7.0-0.okd-2021-04-11-124433. My nodes initially boot 33.20210217.3.0 and then pivot to 33.20210328.3.0 as you describe. According to the logs for machine-config-daemon-firstboot the 3 attempts to use
The pull of quay.io/openshift/okd-content@sha256:16da4074 succeeds but the 6 attempts to use
|
Was there a work-around for this issue? I'm hitting this on a AWS GovCloud deployment using latest FCOS. |
Please do not use the latest image. More information in our blog. https://www.okd.io/blog/2021/03/19/please-avoid-using-fcos-33-20210301-3-1.html |
Need to edit MachineSet and replace spec.template.spec.disks.[0].image from |
same issue on |
This is resolved in https://github.com/openshift/okd/releases/tag/4.7.0-0.okd-2021-06-13-090745 - |
Provision node fails on aws "compatible" (but not aws!) provider with fcos image 33.20210301.3.1 (latest in stable stream).
Good on 33.20210217.3.0.
/run/bin/machine-config-daemon firstboot-complete-machineconfig
W0317 21:12:43.594924 3009 run.go:44] nice failed: running nice -- ionice -c 3 podman cp cd0b412099e22f6a0d227b667c098f38fc8b11047eed222940b179bf4efaf98b:/ /run/mco-machine-os-content/os-content-870423771 failed: Error: 2 errors occurred:
error copying to host: error during bulk transfer for copier.request{Request:"PUT", Root:"/", preservedRoot:"/run/mco-machine-os-content", rootPrefix:"/run/mco-machine-os-content", Directory:"/", preservedDirectory:"/run/mco-machine-os-content", Globs:[]string{}, preservedGlobs:[]string{}, StatOptions:copier.StatOptions{CheckForArchives:false, Excludes:[]string(nil)}, GetOptions:copier.GetOptions{UIDMap:[]idtools.IDMap(nil), GIDMap:[]idtools.IDMap(nil), Excludes:[]string(nil), ExpandArchives:false, ChownDirs:(*idtools.IDPair)(nil), ChmodDirs:(*os.FileMode)(nil), ChownFiles:(*idtools.IDPair)(nil), ChmodFiles:(*os.FileMode)(nil), StripSetuidBit:false, StripSetgidBit:false, StripStickyBit:false, StripXattrs:false, KeepDirectoryNames:false, Rename:map[string]string(nil)}, PutOptions:copier.PutOptions{UIDMap:[]idtools.IDMap(nil), GIDMap:[]idtools.IDMap(nil), DefaultDirOwner:(*idtools.IDPair)(nil), DefaultDirMode:(*os.FileMode)(nil), ChownDirs:(*idtools.IDPair)(0xc0005bf8f0), ChmodDirs:(*os.FileMode)(nil), ChownFiles:(*idtools.IDPair)(0xc0005bf900), ChmodFiles:(*os.FileMode)(nil), StripXattrs:false, IgnoreXattrErrors:false, IgnoreDevices:false, NoOverwriteDirNonDir:false, Rename:map[string]string(nil)}, MkdirOptions:copier.MkdirOptions{UIDMap:[]idtools.IDMap(nil), GIDMap:[]idtools.IDMap(nil), ChownNew:(*idtools.IDPair)(nil), ChmodNew:(*os.FileMode)(nil)}}: copier: put: error setting extended attributes on "/extensions/okd/NetworkManager-ovs-1.26.6-1.fc33.x86_64.rpm": error setting value of extended attribute "user.Zif.MdChecksum[1614897854]" on "/extensions/okd/NetworkManager-ovs-1.26.6-1.fc33.x86_64.rpm": operation not supported
Version
4.7.0-0.okd-2021-03-07-090821
UPI
The text was updated successfully, but these errors were encountered: