Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File descriptor limit change in AMI release v20231220 #1551

Closed
mmerkes opened this issue Dec 22, 2023 · 18 comments
Closed

File descriptor limit change in AMI release v20231220 #1551

mmerkes opened this issue Dec 22, 2023 · 18 comments

Comments

@mmerkes
Copy link
Member

mmerkes commented Dec 22, 2023

What happened:
Customers are reporting hitting ulimits as a result of this PR #1535
What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

  • AWS Region:
  • Instance Type(s):
  • EKS Platform version (use aws eks describe-cluster --name <name> --query cluster.platformVersion):
  • Kubernetes version (use aws eks describe-cluster --name <name> --query cluster.version):
  • AMI Version:
  • Kernel (e.g. uname -a):
  • Release information (run cat /etc/eks/release on a node):
@johnkeates
Copy link

We have hit this issue too, we have about ~1700 pods crashlooping in each cluster. I wonder if the CI doesn't test using a large enough workload?

@tzneal tzneal pinned this issue Dec 22, 2023
@mmerkes
Copy link
Member Author

mmerkes commented Dec 22, 2023

We have already reverted the change that caused this issue (#1535), we're rolling back the v20231220 release and we're preparing to release new AMIs without the change ASAP. More guidance to come.

EDIT: We're not rolling back v20231220. We're focusing on rolling forward the next release with the change reverted.

@maksim-paskal
Copy link

It help us to restore our pods on new nodes, we using Karpenter

apiVersion: karpenter.k8s.aws/v1alpha1
kind: AWSNodeTemplate
...
spec:
 ....
  userData: |
    MIME-Version: 1.0
    Content-Type: multipart/mixed; boundary="BOUNDARY"

    --BOUNDARY
    Content-Type: text/x-shellscript; charset="us-ascii"

    #!/bin/bash

    rm -rf /etc/systemd/system/containerd.service.d/20-limitnofile.conf

    --BOUNDARY--

and drain all new nodes from cluster

@jpedrobf
Copy link

@mmerkes Can you please update us when the AMI is ready for usage?

@adwittumuluri
Copy link

☝️ adding to that, an ETA would much appreciated as well. Is it in the magnitude of hours or days?

@atishpatel
Copy link

atishpatel commented Dec 22, 2023

I'm using this setup for now in karpenter userData. Bumping soft limit from 1024 to 102400

Adding this to our bootstrap for now to 10x the soft limit.

- /usr/bin/sed -i 's/^LimitNOFILE.*$/LimitNOFILE=102400:524288/' /etc/systemd/system/containerd.service.d/20-limitnofile.conf || true

@pkoraca
Copy link

pkoraca commented Dec 22, 2023

If anyone needs, we fixed it in Karpenter by hardcoding the older AMI in AWSNodeTemplate CRD

spec:
  amiSelector:
    aws::ids: <OLD_AMI_ID>

@cartermckinnon
Copy link
Member

cartermckinnon commented Dec 22, 2023

A go runtime change in 1.19 automatically maxes-out the process’ NOFILE limit, so I would expect to see this problem with go binaries on earlier versions: golang/go#46279

Has anyone run into this problem with a workload that isn’t a go program?

@mmerkes
Copy link
Member Author

mmerkes commented Dec 22, 2023

an ETA would much appreciated as well. Is it in the magnitude of hours or days?

We are working on releasing a new set of AMIs ASAP. I will post another update in 3-5 hours on the status. We should have a better idea then.

@1lann
Copy link

1lann commented Dec 22, 2023

Has anyone run into this problem with a workload that isn’t a go program?

People have mentioned running into this problem on envoy proxy, which is a C++ program.

@cartermckinnon
Copy link
Member

cartermckinnon commented Dec 22, 2023

People have mentioned running into this problem on envoy proxy

Yes, I've been looking into that. Envoy doesn't seem to bump its own soft limit, and it also seems to crash hard when the limit is hit (on purpose): aws/aws-app-mesh-roadmap#181

Other things I've noticed:

  1. The soft limit of 1024 is the default on ECS: https://docs.aws.amazon.com/AmazonECS/latest/APIReference/API_Ulimit.html
  2. Java's hotspot VM has bumped the limit by default for ~20 years; point being there's wide variety in how the nofile limit is handled: https://github.com/openjdk/jdk/blob/93fedc12db95d1e61c17537652cac3d4e27ddf2c/src/hotspot/os/linux/os_linux.cpp#L4575-L4589

@suket22
Copy link
Member

suket22 commented Dec 22, 2023

The EKS provided SSM Parameter to reference to the current EKS AMI has been reverted to reference the last good AMI in all regions globally. This will automatically resolve the issue for Karpenter and Managed node group users and any other systems that determine the latest EKS AMI from the SSM Parameter.

We will provide another update by December 29 at 5:00 PM with a deployment timeline for new AMIs.

@polarathene
Copy link

polarathene commented Dec 22, 2023

We have already reverted the change that caused this issue

It'd be ideal to identify what software is not compatible and actually getting that addressed, but I understand the need to revert for the time being.

So long as you avoid infinity, most software will have minimal regression:

  • 2^10 vs 2^20 slows some affected tasks by roughly 1,000x, as opposed to 2^30 where the delta is substantial.
  • If software relies on the legacy select(2) syscall it expects the soft limit to be 1024 to correctly function (additional select() concerns documented here in a dedicated section).
  • For some software like Envoy, it can potentially exceed the traditional 2^20 hard limit. This has been reported on their GH issue tracker already. infinity would avoid that, but it would have been wiser for only Envoy to raise it's limit that high, than expect the environment to workaround Envoy needs, due to prior regression concern points.

If you need to set an explicit limit (presumably because defaults are not sufficient), and the advised 1024:524288 isn't enough due to software not requesting to raise it's limits... You could try matching the suggested hardlimit: LimitNOFILE=524288, or double that for the traditional hard limit (2^20).

That still won't be sufficient for some software as mentioned, but that is software that should know better and handle it's resource needs properly, exhausting the FD limit is per-process, so it's not necessarily an OOM event. The system-wide FD limit is much higher (based on memory IIRC).


People have mentioned running into this problem on envoy proxy, which is a C++ program.

Envoy requires a large number of FDs, they have expressed that they're not interested in raising the soft limit internally and that admins should instead set a high enough soft limit.

I've since opened a feature request to justify why Envoy should raise the soft limit rather than defer that to be externally set high where it can negatively impact other software.

References:


2. Java's hotspot VM has bumped the limit by default for ~20 years;

https://github.com/systemd/systemd/blob/1742aae2aa8cd33897250d6fcfbe10928e43eb2f/NEWS#L60..L94

Note that there are also reports that using very high hard limits (e.g. 1G) is problematic:
some software allocates large arrays with one element for each potential file descriptor (Java, …) — a high hard limit thus triggers excessively large memory allocations in these applications.

For infinity, this could require 1,000 - 1,000,000 times as much memory (MySQL, not Java but an example of excessive memory allocation impact, coupled with usual increased CPU load), even though you may not need that much FDs, hence a poor default.

For Java, related to the systemd v240 release notes, there was this github comment at the time about Java's memory allocation. With the 524288 hard-limit that was 4MB, but infinity when resolving to 2^30 (many modern distros) would equate to 2,000x that (8GB).

While you cite 20 years, note that the hard-limit has incremented over time.

  • That setting choice would have been a non-issue for most of that time, and a DIY workaround of overriding the hard-limit sweeps it under the rug 😅
  • The most substantial increase to the hard-limit was introduced with systemd v240 in 2018Q4 raising it to 2^30.
  • The systemd v240 release took a bit longer to arrive in downstreams of course. While some distros like Debian patched out the 2^30 hard-limit increase (their actual motivation for this IIRC was actually due to a patched PAM issue that wasn't being resolved properly).
  • I haven't looked into the present state of JDK or MySQL to see if they still allocate excessively with a 2^30 hard-limit.

point being there's wide variety in how the nofile limit is handled

This was all (excluding Envoy) part of my original research into moving the LimitNOFILE=1024:524288 change forward. If you want a deep-dive resource on the topic for AWS, I have you covered! 😂

Systemd has it right AFAIK, sane soft and hard limits. For AWS deployments some may need a higher hard limit, but it's a worry when software like Envoy doesn't document anything about that requirement and advises the stance of raising the soft limit externally.

@adjain131995
Copy link

adjain131995 commented Dec 25, 2023

We were using karpenter which again is an AWS backed tool and it started picking up the new AMI dynamically as we started facing issues.
As a hot fix we have harcoded the previous AMI
amiSelector:
aws::name: amazon-eks*node-1.25-v20231201
However, looking forward to the AMI fix we can make it dynamic again

The root cause: #1535

@cartermckinnon cartermckinnon changed the title Customers reporting hitting ulimits after upgrading to v20231220 File descriptor limit change in AMI release v20231220 Dec 25, 2023
@awslabs awslabs deleted a comment from cartermckinnon Dec 26, 2023
@ndbaker1
Copy link
Member

As an update to the previous announcement, we are tracking for a new release by January 4th.

@Collin3
Copy link

Collin3 commented Dec 26, 2023

As an update to the previous announcement, we are tracking for a new release by January 4th.

@ndbaker1 is this file descriptor limit change expected to be reintroduced to that release? Or will that still be excluded? Just wondering if we need to pin our AMI version until we implement our own fix for istio/envoy workloads or something is implemented in envoy itself to handle that change better

@cartermckinnon
Copy link
Member

@Collin3 that change has been reverted and will not be in the next AMI release 👍

@cartermckinnon
Copy link
Member

This is resolved in the latest release: https://github.com/awslabs/amazon-eks-ami/releases/tag/v20231230

@cartermckinnon cartermckinnon unpinned this issue Jan 3, 2024
champtar added a commit to champtar/containerd that referenced this issue Jan 19, 2024
LimitNOFILE was either 1048576 or infinity since 2017
containerd@b009642
This means soft limit was at a minimum 1048576 since then.

Since systemd 240, infinity is 1073741816 which causes issue,
and we must for sure lower the hard limit.

Removing LimitNOFILE is equivalent to 1024:524288, which is the
standard on the host, but was not containerd default since 2017,
so when AWS recently tried they had to revert:
awslabs/amazon-eks-ami#1551

1048576:1048576 has been good since 2017, use that.

Signed-off-by: Etienne Champetier <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests