func NumCPU() fails to return correct cpus when using cgroups2 #9665

sdelrio · 2023-02-23T14:56:06Z

What happened:

Installing ingress-nginx helm chart into k8s 1.24 with base OS ubuntu 22 (cgroups v2) doensn't generate correct worker_process based on k8s resources, instead it get max cpu cores of the node.

values.yaml resourfes part

          resources:
            requests:
              cpu: 200m
              memory: 256Mi
            limits:
              cpu: 1
              memory: 512Mi

On previous ubuntu 20 (cgroups v1) the /etc/nginx/nginx.conf file into the container got configured to

worker_processes 1;

On ubuntu 22 base os (groups v2) the /etc/nginx/nginx.conf file into the container gets configured all the posible cores in the node (64 in my case),

worker_process 64;

What you expected to happen:

/etc/nginx/nginx.conf:

 worker_process 1;

I think the way is calculated only works for cgroup v1 and not for cgroup v2 returning max cpus when not finding cgrup v1 files:

https://github.com/kubernetes/ingress-nginx/blob/main/pkg/util/runtime/cpu_linux.go#L38-L54

func NumCPU() uses cgroup v1 paths to calculate:

/sys/fs/cgroup/cpu/cpu.cfs_quota_us
/sys/fs/cgroup/cpu/cpu.cfs_period_us
In cgroup v2, this was replaced by one file containing the previous 2 values from v1
/sys/fs/cgroup/cpu.max

NGINX Ingress controller version ):

NGINX Ingress controller
  Release:       v1.6.4
  Build:         69e8833858fb6bda12a44990f1d5eaa7b13f4b75
  Repository:    https://github.com/kubernetes/ingress-nginx
  nginx version: nginx/1.21.6
-------------------------------------------------------------------------------

Kubernetes version: v1.24.6

Environment:

Cloud provider or hardware configuration: OnPremise
OS: Ubuntu 22.04
Kernel (e.g. uname -a): Linux worker-4 5.15.0-60-generic
Install tools:
- kubespray v2.10 k8s
Basic cluster related info:
- kubectl version 1.24.6

How to reproduce this issue:
nginx controller being executed in any node with cgroups 2 and limited by 1 CPU

The text was updated successfully, but these errors were encountered:

bmv126 · 2023-02-24T02:13:01Z

From gke 1.26 cgroupsv2 will be made default. So this is an important fix that needs to be done.

longwuyuan · 2023-02-24T07:30:41Z

/triage accepted
/priority important-longterm

nickorlow · 2023-03-13T00:53:29Z

Hi maintainers (cc @longwuyuan )!

I'm a CS student at the University of Texas at Austin. I'm taking a class focusing on open-source software and virtualization at the moment. One part of the course is making open source contributions. I'd like to work on this issue.

The class requires that we ask the maintainers and get an OK to work on the issue before it can be counted for credit in the class. Would it be possible for me and my partner in the class to work on this issue?

longwuyuan · 2023-03-13T05:07:41Z

@nickorlow you were misinformed. OSS projects don't require seeking permission to work on issues. OSS projects welcome improvements.

nickorlow · 2023-03-13T05:10:56Z

Hi! Thanks for your response @longwuyuan ! The requirement to ask the maintainers is a requirement of the class we're in.

longwuyuan · 2023-03-13T05:14:22Z

No such thing but wait for comments from others i guess. Surprised you did not get info on normally how things work in communities. This is dev guide https://kubernetes.github.io/ingress-nginx/developer-guide/getting-started/

longwuyuan · 2023-03-13T05:16:58Z

Assuming you are not aware, helpful things to do is talk in kubernetes.slack.com (register at slack.k8s.io) and submit a PR for your improvement code. Others review and comment and next steps are based on reviews.

nickorlow · 2023-03-13T05:44:51Z

Thanks for the information! I'll ask around on the slack.

The syllabus for my class just says that we need a maintainer to comment on the issue saying that we can work in it.

strongjz · 2023-03-13T13:21:29Z

@nickorlow I responded in slack and assigned this task to you.

bmv126 · 2023-03-22T15:26:07Z

As the default configuration causes all the cores to be used by the container on a cluster using cgroups2.
I feel this needs to be added in known issues section, so that users are aware of this.

bmv126 · 2023-03-27T12:59:47Z

@nickorlow
Are you planning to provide PR on this issue.

nickorlow · 2023-03-27T16:12:42Z

Yes, should push something in the next few days

AlenversFr · 2023-03-29T15:08:08Z

We have currently 400+ ingress controller on newly upgraded 1.25 Azure AKS clusters waiting for a solution.
Azure brings Ubuntu 22 with cgroup v2 with AKS 1.25 !

bmv126 · 2023-03-30T08:45:08Z

WA is to set it in configmap
https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/configmap/#worker-processes

nickorlow · 2023-03-30T16:24:18Z

I'm mostly done. The documentation for cgroups 2 states that the file format for the cpu.max file is $MAX $QUOTA, however “max” for $MAX indicates no limit. In this case should we just return runtime.NumCPU()?

nickorlow · 2023-03-30T16:24:56Z

documentation in question

bmv126 · 2023-03-31T05:00:30Z

Use cgroups2 if available or else fall back to cgroups1.
If both of them cannot be determined use runtime.NumCPU.

k8s-triage-robot · 2024-03-30T05:08:03Z

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

Confirm that this issue is still relevant with /triage accepted (org members only)
Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

tao12345666333 · 2024-03-30T15:12:58Z

/triage accepted

sdelrio added the kind/bug Categorizes issue or PR as related to a bug. label Feb 23, 2023

k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority labels Feb 23, 2023

strongjz added this to [SIG Network] Ingress NGINX Feb 23, 2023

strongjz assigned nickorlow Mar 13, 2023

strongjz moved this to In Progress in [SIG Network] Ingress NGINX Mar 13, 2023

nickorlow mentioned this issue Mar 31, 2023

Generate correct output on NumCPU() when using cgroups2 #9816

Merged

11 tasks

bmv126 mentioned this issue Aug 3, 2023

EKS1.25 generates core dumps with version >=v1.7.0 #10271

Closed

k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Mar 30, 2024

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 30, 2024

longwuyuan mentioned this issue Jun 30, 2024

num of worker_processes set to max num of cores of cluster node with cgroups-v2 #11518

Open

strongjz closed this as completed in #9816 Aug 12, 2024

github-project-automation bot moved this from In Progress to Done in [SIG Network] Ingress NGINX Aug 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

func NumCPU() fails to return correct cpus when using cgroups2 #9665

func NumCPU() fails to return correct cpus when using cgroups2 #9665

sdelrio commented Feb 23, 2023

bmv126 commented Feb 24, 2023

longwuyuan commented Feb 24, 2023

nickorlow commented Mar 13, 2023

longwuyuan commented Mar 13, 2023

nickorlow commented Mar 13, 2023

longwuyuan commented Mar 13, 2023

longwuyuan commented Mar 13, 2023

nickorlow commented Mar 13, 2023

strongjz commented Mar 13, 2023

bmv126 commented Mar 22, 2023

bmv126 commented Mar 27, 2023

nickorlow commented Mar 27, 2023

AlenversFr commented Mar 29, 2023

bmv126 commented Mar 30, 2023

nickorlow commented Mar 30, 2023 •

edited

Loading

nickorlow commented Mar 30, 2023

bmv126 commented Mar 31, 2023

k8s-triage-robot commented Mar 30, 2024

tao12345666333 commented Mar 30, 2024

func NumCPU() fails to return correct cpus when using cgroups2 #9665

func NumCPU() fails to return correct cpus when using cgroups2 #9665

Comments

sdelrio commented Feb 23, 2023

bmv126 commented Feb 24, 2023

longwuyuan commented Feb 24, 2023

nickorlow commented Mar 13, 2023

longwuyuan commented Mar 13, 2023

nickorlow commented Mar 13, 2023

longwuyuan commented Mar 13, 2023

longwuyuan commented Mar 13, 2023

nickorlow commented Mar 13, 2023

strongjz commented Mar 13, 2023

bmv126 commented Mar 22, 2023

bmv126 commented Mar 27, 2023

nickorlow commented Mar 27, 2023

AlenversFr commented Mar 29, 2023

bmv126 commented Mar 30, 2023

nickorlow commented Mar 30, 2023 • edited Loading

nickorlow commented Mar 30, 2023

bmv126 commented Mar 31, 2023

k8s-triage-robot commented Mar 30, 2024

tao12345666333 commented Mar 30, 2024

nickorlow commented Mar 30, 2023 •

edited

Loading