-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aarch64 failing to upgrade with /boot filesystem full #1637
Comments
Since we're getting pretty far into the holiday season rather than try to debug this further I think the simplest thing to do is to pin the kernel (which should cause no new space in boot to get used) and figure out where the bug is after the new year. |
Well I think I was wrong on this front. Even with the same kernel I'm seeing two entries in the /boot filesystem and 244M (122*2) space getting used. |
on a test system this is the only difference between the two
|
Interesting.. If I just install a package versus rebasing there is no new entry in /boot/ostree/ and the usage remains low:
|
Just a few days ago @jbtrystram was also hitting an issue with the Ostree-finalize-staged |
Arg, this makes me realize that I've added the |
The package description does mention ARM SEV?
|
Oh, never mind, Jonathan correctly fixed it in coreos/fedora-coreos-config#2760 |
I think it's this SEV: https://www.amd.com/en/developer/sev.html and it's only on x86_64 AFAIK |
Hmm, it looks like the way we gather journals with kola, we lose all journal messages between the reboot request and the next boot. And that's where we'd potentially see more information about this error beyond what's captured by |
The test almost fully fills the disk: https://github.com/ostreedev/ostree/pull/2847/files#diff-2122c6b56458bc3c16e273279584cd76af1974e19060bfa87eb1217f8a67b82bR30, which might explain why we missed the case where the partition is not full enough to fail right away but after the first kernel/initrd copy. |
That would be nice to fix somehow, but I don't think that would help us much here. I reproduced this on a system and the full journal didn't give much more than this:
|
To aid debugging issues like coreos/fedora-coreos-tracker#1637 If we're hitting this path where we think we have enough space, let's log what we calculated here to aid in diagnosing why we may later fail with ENOSPC.
To aid debugging issues like coreos/fedora-coreos-tracker#1637 If we're hitting this path where we think we have enough space, let's log what we calculated here to aid in diagnosing why we may later fail with ENOSPC.
Hmm, so we didn't see any of the other log messages there? I just did ostreedev/ostree#3123 which would be interesting to see the output of here. (It'd still be handy to have a "continuous" stream tracking git main that we do some CI on like this, then we could just merge that PR and get relatively quick feedback too) |
To aid debugging issues like coreos/fedora-coreos-tracker#1637 If we're hitting this path where we think we have enough space, let's log what we calculated here to aid in diagnosing why we may later fail with ENOSPC.
Thanks! |
To aid debugging issues like coreos/fedora-coreos-tracker#1637 If we're hitting this path where we think we have enough space, let's log what we calculated here to aid in diagnosing why we may later fail with ENOSPC.
We discussed this in the community meeting today:
|
Contains the fix for the corner case issue described in coreos/fedora-coreos-tracker#1637
Contains the fix for the corner case issue described in coreos/fedora-coreos-tracker#1637
We need this hack again to work around a new corner case in the /boot ENOSPC wars. See coreos/fedora-coreos-tracker#1637 (cherry picked from commit 09fbb20)
Contains the fix for the corner case issue described in coreos/fedora-coreos-tracker#1637
This was discussed in the meeting today: see below: |
ostreedev/ostree#3130 merged and made it into As an extra mitigation we also removed some dtb files in coreos/fedora-coreos-config#2788 and got that in the three recent most releases (stable 39.20231204.3.3, testing 39.20240104.2.0, next 39.20240104.1.0). |
The fix for this went into |
The fix for this went into |
The fix for this went into |
Add the barrier to stable in coreos/fedora-coreos-streams@a01649c |
The software fixing coreos/fedora-coreos-tracker#1637 is now in all streams and barriers have been added. We can drop this.
The software fixing coreos/fedora-coreos-tracker#1637 is now in all streams and barriers have been added. We can drop this.
We need this hack again to work around a new corner case in the /boot ENOSPC wars. See coreos/fedora-coreos-tracker#1637
Contains the fix for the corner case issue described in coreos/fedora-coreos-tracker#1637
The software fixing coreos/fedora-coreos-tracker#1637 is now in all streams and barriers have been added. We can drop this.
We've seen this before but our mitigation of autopruning old deployments has been working well. For some reason I think we've hit a corner case in the autopruning code and it's not kicking in on certain upgrade paths. This was caught in our extended upgrade tests: kola-upgrade#2487.
Here is what the failure looks like:
I think this is a corner case where somehow the autopruning logic
doesn't kick in. For example if I run the update on a system of mine
that has been following and auto-updating every week then somehow the
logic does kick in:
Notice the
Insufficient space left in bootfs; updating bootloader in two steps
.The text was updated successfully, but these errors were encountered: