num of worker_processes set to max num of cores of cluster node with cgroups-v2 #11518

figaw · 2024-06-30T10:04:48Z

Am I holding it wrong?

I'm reading a comment on here, which says to adjust worker process to no more than 24, mine is automatically adjusted to 128, which causes weird things to happen.

#3574 (comment)

The problem goes away, when I set worker_processes in the helm chart.

controller:
  config:
    worker-processes: 24

Where is this documented? I've tried to search around for comments on ulimits and ingress-nginx, but I'm not finding a lot.

What happened:

From the logs of the ingress-nginx-controller I'm reading..

2024/06/29 20:31:34 [alert] 42#42: socketpair() failed while spawning "worker process" (24: No file descriptors available)
2024/06/29 20:31:34 [alert] 42#42: socketpair() failed while spawning "worker process" (24: No file descriptors available)
2024/06/29 20:31:34 [alert] 42#42: socketpair() failed while spawning "worker process" (24: No file descriptors available)
2024/06/29 20:31:34 [alert] 42#42: socketpair() failed while spawning "worker process" (24: No file descriptors available)

This all went away when I configured worker_processes 24 in the helm chart.

Maybe this is related to #7107?

What you expected to happen:

NGINX automagically configures a proper number of worker process'.
I expect this has something to do with the 128 cores..

When I'm running ulimit inside the container, I'm getting quite low values,

ingress-nginx-private-controller-f56b88476-b8tpq:/etc/nginx$ ulimit -Hn
524288
ingress-nginx-private-controller-f56b88476-b8tpq:/etc/nginx$ ulimit -Sn
1024

Despite having configured the host,

$ cat /etc/security/limits.conf
# /etc/security/limits.conf
* soft nofile 65535
* hard nofile 65535
$ ulimit -Hn
65535
$ ulimit -Sn
65535

And also having configured containerd:

$ cat /etc/containerd/config.toml | grep runc.options -A 20
          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
            .....
            Ulimits = [
              { Name = "nofile", Hard = 65535, Soft = 65535 }
            ]

I also tried using an initContainer with the helm chart, to no avail..

  extraInitContainers:
    - name: init-myservice
      image: busybox
      command: ["sh", "-c", "ulimit -n 65535"]

I'm "pretty sure" all of the machines in our cluster will have at least 24 cores, so this is "probably" not a problem to configure statically.

NGINX Ingress controller version (exec ...):

NGINX Ingress controller
Release: v1.10.1
Build: 4fb5aac
Repository: https://github.com/kubernetes/ingress-nginx
nginx version: nginx/1.25.3

Kubernetes version (use kubectl version):

Client Version: v1.26.0
Kustomize Version: v4.5.7
Server Version: v1.29.0

Environment:

Cloud provider or hardware configuration:

Bare metal, super micro, AMD EPYC 7763 64-Core Processor, 256G RAM

OS (e.g. from /etc/os-release):

PRETTY_NAME="Ubuntu 22.04.4 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.4 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

Kernel (e.g. uname -a):
Linux b-w-3 5.15.0-113-generic #123-Ubuntu SMP Mon Jun 10 08:16:17 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Install tools:

$ kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"29", GitVersion:"v1.29.5", GitCommit:"59755ff595fa4526236b0cc03aa2242d941a5171", GitTreeState:"clean", BuildDate:"2024-05-14T10:44:51Z", GoVersion:"go1.21.9", Compiler:"gc", Platform:"linux/amd64"}

Basic cluster related info:

$ kubectl get nodes -o wide
NAME    STATUS   ROLES           AGE   VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION       CONTAINER-RUNTIME
b-w-1   Ready    control-plane   16h   v1.29.5   172.17.90.1   <none>        Ubuntu 22.04.4 LTS   5.15.0-112-generic   containerd://1.7.13
b-w-2   Ready    control-plane   16h   v1.29.5   172.17.90.3   <none>        Ubuntu 22.04.4 LTS   5.15.0-113-generic   containerd://1.7.13
b-w-3   Ready    control-plane   16h   v1.29.5   172.17.90.5   <none>        Ubuntu 22.04.4 LTS   5.15.0-113-generic   containerd://1.7.13
b-w-4   Ready    <none>          16h   v1.29.5   172.17.90.7   <none>        Ubuntu 22.04.4 LTS   5.15.0-113-generic   containerd://1.7.13

How was the ingress-nginx-controller installed:
- If helm was used then please show output of helm ls -A | grep -i ingress

$ helm ls -A | grep -i ingress
ingress-nginx-private   ingress-nginx-private   6               2024-06-30 11:24:48.843691613 +0200 CEST        deployed        ingress-nginx-4.10.1            1.10.1

The text was updated successfully, but these errors were encountered:

longwuyuan · 2024-06-30T10:44:47Z

duplicate #9665
/triage accepted

longwuyuan · 2024-06-30T10:49:16Z

/retitle num of worker_processes set to max num of cores of cluster node with cgroups-v2

strongjz · 2024-07-15T23:31:29Z

we need to update our support for cgroups v2, to my knowledge this is the package that figures out CPUs, and its not been updated it in 6 years

https://github.com/kubernetes/ingress-nginx/blame/125ffd47b132fa7d18c4aa81501736ff89cc0676/pkg/util/runtime/cpu_linux.go#L30

github-actions · 2024-08-15T01:51:01Z

This is stale, but we won't close it automatically, just bare in mind the maintainers may be busy with other tasks and will reach your issue ASAP. If you have any question or request to prioritize this, please reach #ingress-nginx-dev on Kubernetes Slack.

domainname · 2024-10-18T13:43:44Z

Hi @strongjz Is there any plan to fix this bug for cgroup v2?

fullykubed · 2024-10-21T15:20:24Z

@domainname @figaw @strongjz

Also ran into this issue.

I saw a related PR (#11778) that seems to attempt to resolve the issue here, and it says this was included in the v1.11.3 release.

After upgrading to that version and adding a CPU limit to the pods, I saw that worker_processes was being set to the CPU limit of the pod, not the number of CPU cores on the node.

I believe this issue is resolved.

shayrybak · 2024-10-27T09:09:24Z

Same here, ran to the same issue version 1.11.3 fixes the number of workers to make sense with the cpu limit of the pod and fixed the issue.

figaw added the kind/bug Categorizes issue or PR as related to a bug. label Jun 30, 2024

k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority labels Jun 30, 2024

strongjz added this to [SIG Network] Ingress NGINX Jun 30, 2024

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 30, 2024

k8s-ci-robot changed the title ~~"No file descriptors available" on machine with a high number (128) of cores~~ num of worker_processes set to max num of cores of cluster node with cgroups-v2 Jun 30, 2024

longwuyuan mentioned this issue Jul 9, 2024

Ingress NGINX v1.10.2 & v1.11.0 throw core dumps #11588

Closed

github-actions bot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Aug 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

num of worker_processes set to max num of cores of cluster node with cgroups-v2 #11518

num of worker_processes set to max num of cores of cluster node with cgroups-v2 #11518

figaw commented Jun 30, 2024 •

edited

Loading

longwuyuan commented Jun 30, 2024

longwuyuan commented Jun 30, 2024

strongjz commented Jul 15, 2024

github-actions bot commented Aug 15, 2024

domainname commented Oct 18, 2024 •

edited

Loading

fullykubed commented Oct 21, 2024

shayrybak commented Oct 27, 2024

num of worker_processes set to max num of cores of cluster node with cgroups-v2 #11518

num of worker_processes set to max num of cores of cluster node with cgroups-v2 #11518

Comments

figaw commented Jun 30, 2024 • edited Loading

NGINX Ingress controller version (exec ...):

longwuyuan commented Jun 30, 2024

longwuyuan commented Jun 30, 2024

strongjz commented Jul 15, 2024

github-actions bot commented Aug 15, 2024

domainname commented Oct 18, 2024 • edited Loading

fullykubed commented Oct 21, 2024

shayrybak commented Oct 27, 2024

figaw commented Jun 30, 2024 •

edited

Loading

domainname commented Oct 18, 2024 •

edited

Loading