-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Low memory machines fails to intialize/boot fcos #1540
Comments
I mentioned this in matrix, but I'll say it again here: I'm surprised it's been working this whole time (I certainly never test on a machine that small). Did you happen to find which version in the |
FWIW I just tried booting a 512M qemu qcow image (
|
Yeah, it’s
Interesting. Maybe assuming it’s a RAM issue was the wrong idea? For sure: I can consistently make the boot fail with a
Yes! I just want to spark a broader discussion. Because it’s a usage we have and that does not work anymore as well as the other "things to consider" I mentioned. |
The |
Here are my findings:
Just to confirm it continued to fail for another testing version along the way:
|
The difference there was:
So maybe the size increased a lot for Ignition. You can investigate further by grabbing the RPMs from koji. |
Apparently that's another issue. |
We discussed this topic in the community meeting today. It was pointed out that Fedora does have some documentation on minimum system requirements here. That guidance currently recommends 2G+. While it would be nice if 512M would continue to work I don't think it's worth us spending time on it. You could use Fedora Cloud image but I don't even think that would work as |
Even with all of that said, if someone were to find the root cause of the change in behavior and propose a patch it would be considered. |
Too bad I couldn’t join the meeting this morning. :(
I’d like to point out this is misguided. Recommending 2G+ on a user system is a low bar nowadays. This documentation even emphasize that GUI-desktop and services tend to consume a lot. On this other side, requiring 2GB+ on any cloud environment is unreasonable. With the research of very high availability, engineers in the field tend to scale horizontally rather than vertically. Meaning they actually seek low-spec machines, but prefer having multiple of them. I mean it: a big portion of the work is actually making sure that service can run on the smallest spec possible: very small, very low footprint containers. If Fedora CoreOS choose to follow 2GB+ minimal RAM, I believe it becomes consequently a bad choice for cloud computing. Imagine if the smallest machine in any horizontal scaling system have to be 2GB minimum… that would be a waste of energy and money. I'm aware container orchestrators adds another layer to circumvent that issue, but still: orchestrators themselves needs services outside of them to work properly: key-value stores, secret vaults, VPN, service mesh… This is especially a big problem for me because of the lack of guarantee. While I understand fcos might run fine with 1GB machine (or lower if original "problem" of this ticket is ever "fixed"), deciding on a 2GB minimal spec means I would receive no help and support if ever I get a problem related to fcos RAM consumption on a machine under 2GB (just to clarify this already: I do not expect anyone to solve the problem: but there is a difference between recognizing there is even a problem VS "no problem here").
Sadly I cannot offer much in terms of debugging besides testing on AWS. |
Come join us same time next week. |
Like Dusty did, I tried running a QEMU image with 512MB memory and it booted:
Anything under 500MB of memory failed to boot for me, likely due to the initrd not having enough space in RAM to be extracted to, leading to files missing from the initramfs and the boot process failing. If kernels running on AWS / Xen instances reserve just a slightly more memory for themselves or during boot, then we end up in AWS systems not booting with 512MB RAM. I suspect that with the size of the initrd growing, low memory systems will be less and less supported as time goes on. Related discussions in #1465 & #1247. Fixing this would require a significant amount of effort, but is not out of reach. So while we very much want to support as many configurations and platforms as possible, we have to be honest upfront to our users that systems below a minimum bar might encounter issues at some point. Everyone is free to ignore those recommendations. There is obviously no "good value" for this as everyone has a different use case. The "best we can do" values are the ones we run our tests with, because we would have a fairly good confidence that this configuration will work. If I'm not mistaken, the current default in 1GB. |
Hey travier, thanks for the answer! I'm worried regarding my second point. See, on my perspective this created some downtime and significant manpower on our end. It used to work, it does not anymore. Thus the discussion about is the minimal amount of RAM supported; but, even if I wish you could tell me 512M is the minimum "officially" supported but I understand the effort it requires is high, so it’s more of a "if it works, it works. if it does not, it does not"-stance. So now I'm left wondering: what if I have an issue with 1GB RAM machines in the next months, is it gonna be considered a bug or not? (Maybe now because it’s your test machine size, but that’s subject to changes.) Because the answer directly impact my ability to offer stability in the system I create and maintain as well as providing the correct tool for our end goals. Of course, I do not expect you guys to jump and solve bugs and issues unrelated to Fedora/CoreOS itself, but what if it happens - let’s say afterburn unit leaks a lot of memory but it’s hard to figure why - what will be your stance if the machine has 512M, 1GB, 2GB? For example, microOS is clear: they support 1GB with some caveats. I didn’t test that, but if I have an issue with a microOS not booting on 1GB machine, I'm gonna assume they will fix it. The difference here is that I can say to money holders that: "we use a system that officially runs on this type of machines, with these specs." (and therefore: if it does not anymore, everyone would expect the problem to be fixed) So, the discussion is: can coreOS take an official stance that 1GB is the minimum supported from the time being? |
MicroOS docs say:
I think I read that to say: we need 1G for microOS and you add whatever memory you need (in addition to 1G) for your application. I think we fit fine in those restraints typically. The problem that you are running into right now is that the initramfs won't unpack into 512M on that instance type. However, once the system is booted (gets past the initramfs) it runs fine with no apps in less than 512M of memory. If you don't layer any packages then 512M of memory would probably continue to update fine. I think what I'm trying to say is:
I don't think we are going to make an official stance on this beyond the docs that were already linked. As @travier mentioned we already run most of our tests in VMs with |
Is the I have also had a look at what takes space: Ignition looks bad as I am struggling to believe that 30MB is not something that can be reduced for a program that does little technically (i.e. reading JSON and spawning external programs to do the "hard work"). Same thing, to a lesser extent, for afterburn. Network Manager looks pretty bad too: it's 10MB of binaries redundant with included systemd libraries: adding the Another way of looking at the problem is at installation time.
And/or it could provide a flag to uncompress |
Fedora CoreOS is an open source project. It does not come with any guarantee for support. We try to fix as many issues as we can but there are no guarantee that any specific issue will be fixed. We're not special here, every open source project is like that, it's written in the license. I'm not saying that this will never be fixed or that we won't accept a PR to fix it. As I wrote in #1540 (comment), fixing this is not easy (otherwise we would likely be doing it). Instead, we're suggesting workarounds. One of those (lost to chat) is:
As Ignition is the largest binary, we could consider stripping it and removing debug info as we don't really expect user to debug Ignition in the initramfs: https://gophercoding.com/reduce-go-binary-size/ |
IIUC our binary as delivered by the RPM is already stripped and without debug_info:
|
Indeed, I have just checked. The swap "trick" does not work (who would have thunk Would you consider using More generally, compressing ignition, afterburn, nmcli, bash and NetworkManager that way reduces the uncompressed size from 156MB to 131MB, keeping the same features! Is this where compression is set for For what it's worth, recompressing with With the few individually compressed binaries, both That's a compound save of 35MB at worse. Tangentially, this is a strong case against go on constrained systems (I would argue There is no clear path to binary reduction in go. There is no |
NOTE: I wrote this response last Friday, but realize just today I never clicked to make the comment (it was in an open tab). I'm submitting it now, but some of the info may be outdated or the conversation could have moved on.
Yes. Using If you were to make the compress=cat change locally I imagine you'd hit some trouble eventually. Though you could experiement with using one of the other compression alorithms, which may less memory intensive during decompression.
This is part of the downsides of the Go and Rust programming languages. I would love to make those binaries smaller, but don't have any ideas other than a rewrite of the software, which would represent significant investment.
We chose NM for the networking stack a long time ago. The media that we ship will continue to do so unless something significant changes.
Honestly this stuff is happening so early in boot I doubt a swap file would matter at all.
|
Interesting. TIL about upx. Honestly I'm not really sure of the drawbacks but I feel like the reward/risk ratio might be pretty low here. Has anyone else following this thread used it?
Yes that should be the place it's controlled. See coreos/fedora-coreos-config#1844 and #1247 (comment). It reduced the size and reduced the amount of time to decompress.
I'm not sure exactly what you're advocating for here. The problem we are running into is running out of memory when decompressing and extracting the initramfs. So what we need to do make sure that the decompression and extraction (both happening in memory) don't step over 512M. It's more a combination of things and not just compressed initramfs size that dictate whether we fail here. For example, maybe the |
Apparently though the kernel will make a copy whatever happens so the possibly "cat-compressed" archive will be put in memory first by the bootloader and the kernel will copy them over to the
That's the spirit of my last ("tangent") comment: I know this won't be rewritten and I know there is no trivial nor not-so-trivial way to reduce go binary size. I've had a look: we are in the same boat. All I say is at time when a new feature is discussed for implementation in Fedora, if that thing must make its way to the initramfs, I would greatly appreciate it if the issue of binary size was raised with implementers so that they consider language thoroughly.
Thanks!
Excellent point: I was just thinking in terms of what is in memory at any given time, which would be {bootloader+compressed kernel/initramfs}, then {kernel + compressed initramfs + tmpfs with uncompressed initramfs} then {kernel + tmpfs}. At the moment stage stage 2 seems to be blocking (hence my little calculation) and I had assumed decompression occurred on (very) small chunks of memory but that was a baseless assumption on my part! I need to run some tests. I would like to see how the kernel handle multi-layered initramfs (i.e. with multiple cpio archives, like we have currently for the microcodes), especially with respect to memory allocation. Alternatively there could be a way to extract all the first boot bits into their own image that is brought up by a thinner initramfs on first boot and potentially removed once first boot succeeded. E.g. ignition and afterburn could literally be dropped into ESP by |
I wouldn't conflate Go and Rust in this respect. It very much depends on things, and rewriting (in what?) isn't necessarily going to make things smaller! One concrete drawback of Go specifically is called out here u-root/u-root#1477 (comment) - and Ignition is a heavy user of |
I did do some tests with |
Describe the bug
Hi 👋
Since last stable version, coreOS does not boot anymore on AWS
nano
instance type. These machines have512M
of RAM.Reproduction steps
t3a.nano
.Expected behavior
Either fix the problem by lowering the footprint of the first fcos initialization or direct me to ways to not shadow things in RAM during initialization or be clear about expected specs for coreOS?
Things to consider:
Actual behavior
Relevant errors in log:
Bigger-spec machines boots with same configuration.
System details
AWS t3a.nano
Fedora CoreOS stable 38.20230722.3.0 x86_64
Butane or Ignition config
No response
Additional information
No response
The text was updated successfully, but these errors were encountered: